FAQ
Hi All,

I am working on *cloudara impala* *metastore* as *mysql*, how to run *
statestored* in impala without cloudara manager ,

i am able to run impalad in single node successfully and by
using impala-shell we are able to get the data from impalad.

so now we try to run the statestore for 2 node cluster. i am not able to
understand about this statestore.


I am able to understand when i implement it in single node.i am not able
understand when implement it in cluster environment.

Please explain me what are all nessary configuration things to follow when
you implment it in cluster environment.

Anybody explain it clearly.

i got below error :

E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
webserver on port: 25010

i stuck over here.

Thanks in Advance.

Search Discussions

  • bc Wong at Mar 6, 2013 at 8:11 pm
    CM dramatically reduces your setup complexity. So I'd recommend that.

    Adding impala-user for help with non-CM setup. First thing I'd check, given
    the error message, is port conflict.

    Cheers,
    bc
    On Wed, Mar 6, 2013 at 3:22 AM, Anil Kumar wrote:

    Hi All,

    I am working on *cloudara impala* *metastore* as *mysql*, how to run *
    statestored* in impala without cloudara manager ,

    i am able to run impalad in single node successfully and by
    using impala-shell we are able to get the data from impalad.

    so now we try to run the statestore for 2 node cluster. i am not able
    to understand about this statestore.


    I am able to understand when i implement it in single node.i am not able
    understand when implement it in cluster environment.

    Please explain me what are all nessary configuration things to follow when
    you implment it in cluster environment.

    Anybody explain it clearly.

    i got below error :

    E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
    webserver on port: 25010

    i stuck over here.

    Thanks in Advance.

  • Anil Kumar at Mar 7, 2013 at 4:50 am
    Hi wong,

    1) can impala-state-store delegate the queries to impalads running
    in another host like any load balencing is there.?

    2) when we implement impala in cluster environment, do i need to dump the
    same data into all data nodes or only in master node?


    Thanks in advance,

    On Thursday, March 7, 2013 1:41:29 AM UTC+5:30, bc Wong wrote:

    CM dramatically reduces your setup complexity. So I'd recommend that.

    Adding impala-user for help with non-CM setup. First thing I'd check,
    given the error message, is port conflict.

    Cheers,
    bc

    On Wed, Mar 6, 2013 at 3:22 AM, Anil Kumar <anilara...@gmail.com<javascript:>
    wrote:
    Hi All,

    I am working on *cloudara impala* *metastore* as *mysql*, how to run *
    statestored* in impala without cloudara manager ,

    i am able to run impalad in single node successfully and by
    using impala-shell we are able to get the data from impalad.

    so now we try to run the statestore for 2 node cluster. i am not able
    to understand about this statestore.


    I am able to understand when i implement it in single node.i am not able
    understand when implement it in cluster environment.

    Please explain me what are all nessary configuration things to follow
    when you implment it in cluster environment.

    Anybody explain it clearly.

    i got below error :

    E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
    webserver on port: 25010

    i stuck over here.

    Thanks in Advance.

  • bc Wong at Mar 7, 2013 at 5:15 am

    On Wed, Mar 6, 2013 at 8:50 PM, Anil Kumar wrote:

    Hi wong,

    1) can impala-state-store delegate the queries to impalads running
    in another host like any load balencing is there.?
    The state store does not perform query delegation. The client
    (impala-shell) picks an impalad to submit the query to. Any impalad will
    do. See `impala-shell --help'. The impalad that receives the query will
    distribute the query to other nodes based on various factors, like data
    locality.

    2) when we implement impala in cluster environment, do i need to dump
    the same data into all data nodes or only in master node?
    Neither. The short answer is: simply put the data in HDFS and stop worrying.

    You may have some misunderstanding about how HDFS works. You don't get to
    control data placement. HDFS itself does data placement. And Impala is
    smart w.r.t. data locality, which you don't have to worry about. (We can
    talk about increasing the replication factor for hot files. But that's an
    advance topic.)

    Cheers,
    bc

    Thanks in advance,

    On Thursday, March 7, 2013 1:41:29 AM UTC+5:30, bc Wong wrote:

    CM dramatically reduces your setup complexity. So I'd recommend that.

    Adding impala-user for help with non-CM setup. First thing I'd check,
    given the error message, is port conflict.

    Cheers,
    bc

    On Wed, Mar 6, 2013 at 3:22 AM, Anil Kumar wrote:

    Hi All,

    I am working on *cloudara impala* *metastore* as *mysql*, how to run
    *statestored* in impala without cloudara manager ,

    i am able to run impalad in single node successfully and by
    using impala-shell we are able to get the data from impalad.

    so now we try to run the statestore for 2 node cluster. i am not able
    to understand about this statestore.


    I am able to understand when i implement it in single node.i am not able
    understand when implement it in cluster environment.

    Please explain me what are all nessary configuration things to follow
    when you implment it in cluster environment.

    Anybody explain it clearly.

    i got below error :

    E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
    webserver on port: 25010

    i stuck over here.

    Thanks in Advance.

  • Anil Kumar at Mar 7, 2013 at 6:34 am
    Hi *Wong,*

    Thank you for your suggestions. you clarified my doubts ,

    i have 2 nodes in my cluster.

    one node i am treating as namenode as well as data node and seconde node
    treated as one more datanode.

    in this case.it is taking master node as name node and another node as
    data node but it is not taking master as a data node.

    I am feeling i miss some configuration to make my master node as data node.


    If i want to see my master node as data node and as well as name node what
    are all configuration changes i need to do.

    Please help me in this.


    Note : If i have only one node cluster , then its taking that node as name
    node and data node but problem is coming when i am adding one more node to
    cluster.


    Thanks in Advance,


    On Thursday, March 7, 2013 10:45:22 AM UTC+5:30, bc Wong wrote:

    On Wed, Mar 6, 2013 at 8:50 PM, Anil Kumar <anilara...@gmail.com<javascript:>
    wrote:
    Hi wong,

    1) can impala-state-store delegate the queries to impalads running
    in another host like any load balencing is there.?
    The state store does not perform query delegation. The client
    (impala-shell) picks an impalad to submit the query to. Any impalad will
    do. See `impala-shell --help'. The impalad that receives the query will
    distribute the query to other nodes based on various factors, like data
    locality.

    2) when we implement impala in cluster environment, do i need to dump
    the same data into all data nodes or only in master node?
    Neither. The short answer is: simply put the data in HDFS and stop
    worrying.

    You may have some misunderstanding about how HDFS works. You don't get to
    control data placement. HDFS itself does data placement. And Impala is
    smart w.r.t. data locality, which you don't have to worry about. (We can
    talk about increasing the replication factor for hot files. But that's an
    advance topic.)

    Cheers,
    bc

    Thanks in advance,

    On Thursday, March 7, 2013 1:41:29 AM UTC+5:30, bc Wong wrote:

    CM dramatically reduces your setup complexity. So I'd recommend that.

    Adding impala-user for help with non-CM setup. First thing I'd check,
    given the error message, is port conflict.

    Cheers,
    bc

    On Wed, Mar 6, 2013 at 3:22 AM, Anil Kumar wrote:

    Hi All,

    I am working on *cloudara impala* *metastore* as *mysql*, how to
    run *statestored* in impala without cloudara manager ,

    i am able to run impalad in single node successfully and by
    using impala-shell we are able to get the data from impalad.

    so now we try to run the statestore for 2 node cluster. i am not
    able to understand about this statestore.


    I am able to understand when i implement it in single node.i am not
    able understand when implement it in cluster environment.

    Please explain me what are all nessary configuration things to follow
    when you implment it in cluster environment.

    Anybody explain it clearly.

    i got below error :

    E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
    webserver on port: 25010

    i stuck over here.

    Thanks in Advance.

  • bc Wong at Mar 7, 2013 at 8:49 am
    [bcc: impala-user]

    Since you're apparently not using Cloudera Manager, my first recommendation
    is for you to try it:
    https://ccp.cloudera.com/display/FREE45DOC/Cloudera+Manager+4.5+Free+Edition+Documentation.
    It takes care of basic setup problems like what you described.

    Cheers,
    bc
    On Wed, Mar 6, 2013 at 10:34 PM, Anil Kumar wrote:

    Hi *Wong,*

    Thank you for your suggestions. you clarified my doubts ,

    i have 2 nodes in my cluster.

    one node i am treating as namenode as well as data node and seconde node
    treated as one more datanode.

    in this case.it is taking master node as name node and another node as
    data node but it is not taking master as a data node.

    I am feeling i miss some configuration to make my master node as data
    node.


    If i want to see my master node as data node and as well as name node what
    are all configuration changes i need to do.

    Please help me in this.


    Note : If i have only one node cluster , then its taking that node as name
    node and data node but problem is coming when i am adding one more node to
    cluster.


    Thanks in Advance,


    On Thursday, March 7, 2013 10:45:22 AM UTC+5:30, bc Wong wrote:
    On Wed, Mar 6, 2013 at 8:50 PM, Anil Kumar wrote:

    Hi wong,

    1) can impala-state-store delegate the queries to impalads running
    in another host like any load balencing is there.?
    The state store does not perform query delegation. The client
    (impala-shell) picks an impalad to submit the query to. Any impalad will
    do. See `impala-shell --help'. The impalad that receives the query will
    distribute the query to other nodes based on various factors, like data
    locality.

    2) when we implement impala in cluster environment, do i need to dump
    the same data into all data nodes or only in master node?
    Neither. The short answer is: simply put the data in HDFS and stop
    worrying.

    You may have some misunderstanding about how HDFS works. You don't get to
    control data placement. HDFS itself does data placement. And Impala is
    smart w.r.t. data locality, which you don't have to worry about. (We can
    talk about increasing the replication factor for hot files. But that's an
    advance topic.)

    Cheers,
    bc

    Thanks in advance,

    On Thursday, March 7, 2013 1:41:29 AM UTC+5:30, bc Wong wrote:

    CM dramatically reduces your setup complexity. So I'd recommend that.

    Adding impala-user for help with non-CM setup. First thing I'd check,
    given the error message, is port conflict.

    Cheers,
    bc

    On Wed, Mar 6, 2013 at 3:22 AM, Anil Kumar wrote:

    Hi All,

    I am working on *cloudara impala* *metastore* as *mysql*, how to
    run *statestored* in impala without cloudara manager ,

    i am able to run impalad in single node successfully and by
    using impala-shell we are able to get the data from impalad.

    so now we try to run the statestore for 2 node cluster. i am not
    able to understand about this statestore.


    I am able to understand when i implement it in single node.i am not
    able understand when implement it in cluster environment.

    Please explain me what are all nessary configuration things to follow
    when you implment it in cluster environment.

    Anybody explain it clearly.

    i got below error :

    E0306 21:45:45.764327 26084 statestored-main.cc:52] Could not start
    webserver on port: 25010

    i stuck over here.

    Thanks in Advance.

  • Anil Kumar at Mar 8, 2013 at 3:20 am
    Hi wong,

    Thank you for giving the valuable suggestions.

    The problem with CM is, in my company from software installation it wont
    hit the internet sites but CM does.

    So we shifted to working on rpm packages download manually and did it.

    i have 2 node cluster both the systems impalad is running successfully.
    from one node i run the query using impala-shell.

    so what happens you know the first node it is showing results and suddenly
    stops printing the results and blocked because second node is giving error

    E0308 08:29:15.963053 28519 impala-server.cc:1301] unknown query id:
    a8263f748d794d10:b9b5d81323298105

    May i know the reason please

    Thanks in Advance,
  • bc Wong at Mar 11, 2013 at 10:45 am

    On Thu, Mar 7, 2013 at 7:20 PM, Anil Kumar wrote:

    Hi wong,

    Thank you for giving the valuable suggestions.

    The problem with CM is, in my company from software installation it
    wont hit the internet sites but CM does.

    So we shifted to working on rpm packages download manually and did it.
    If you got RPMs locally, you can setup a local CM mirror to install CM. CM
    can deploy CDH via parcels, which are a lot easier to mirror than packages.
    (This is very common as most clusters don't have direct web access.)

    i have 2 node cluster both the systems impalad is running successfully.
    from one node i run the query using impala-shell.

    so what happens you know the first node it is showing results and
    suddenly stops printing the results and blocked because second node is
    giving error

    E0308 08:29:15.963053 28519 impala-server.cc:1301] unknown query id:
    a8263f748d794d10:b9b5d81323298105

    May i know the reason please
    I don't know enough about your environment to comment. And I've never seen
    that error before in any Impala managed by CM. However, if you really can't
    use CM. the impala-user list would be a better place to get your answer.

    Cheers,
    bc

    Thanks in Advance,

  • Anil Kumar at Mar 16, 2013 at 2:32 pm
    Hi wong,


    Thanks Wong as per your suggestion i installed CM sucessfully using rpm
    packages which are needed for CM installation and started services .

    Now i stuck with installation of cdh4.2 and impala 0.6 with that CM.

    You said it is easy to install with parcels.

    Can you please tell me how to create local repository mirror with parcels
    for cdh4 and impala installation.

    If you explain step by step that would be great.we wasted time last 2
    months for this installation.

    Please help with step by step precess,so that we will get better idea.

    Note: we studied document available at this location
    https://ccp.cloudera.com/display/CDH4DOC/Creating+a+Local+Yum+Repository and
    some other documents but we are not able to get understand

    Waiting for your response.

    Thanks in Advance.
  • Anil Kumar at Mar 16, 2013 at 2:33 pm
    On Saturday, March 16, 2013 8:02:01 PM UTC+5:30, Anil Kumar wrote:
    Hi wong,


    Thanks Wong as per your suggestion i installed CM sucessfully using rpm
    packages which are needed for CM installation and started services .

    Now i stuck with installation of cdh4.2 and impala 0.6 with that CM.

    You said it is easy to install with parcels.

    Can you please tell me how to create local repository mirror with parcels
    for cdh4 and impala installation.

    If you explain step by step that would be great.we wasted time last 2
    months for this installation.

    Please help with step by step precess,so that we will get better idea.

    Note: we studied document available at this location
    https://ccp.cloudera.com/display/CDH4DOC/Creating+a+Local+Yum+Repository and
    some other documents but we are not able to get understand

    Waiting for your response.

    Thanks in Advance.

  • Anil Kumar at Mar 17, 2013 at 2:54 am
    Hi Wong,

    In
    https://ccp.cloudera.com/display/CDH4DOC/Creating+a+Local+Yum+Repository
    this documention they mentioned like below

    On your web server, go to /var/www/html/cdh/4/ and type the following
    command: createrepo .

    but i gone to the /var/www/html/cdh/4/ location from my terninnal and
    typed createrepo .

    but this cmd is not recognized.

    how can i create the metadata for this.am i doing anything wrong.

    please guide me on this.

    Thanks in Advance.
  • Vikas Singh at Mar 17, 2013 at 5:47 am
    Hi Anil,

    Did you run the following command in step #2: "sudo yum install yum-utils
    createrepo"? This should install the createrepo tool. If you did install
    it, can you confirm the installation by running 'yum list installed | grep
    createrepo'. You should also find the command installed in
    '/usr/bin/createrepo'

    If you want to use Parcel (recommended by Cloudera and is lot simpler), for
    creating a local parcel repository, here are the steps:

    If the Cloudera Manager server does not have Internet access, you can
    access the Cloudera parcels directories (at
    http://archive.cloudera.com/cdh4/parcels/ or
    http://beta.cloudera.com/impala/parcels/) from another location, and then
    drop the .parcel file into your local parcel-repo directory. You will also
    need to create a .sha file from the information found in the manifest.json
    file in the parcels directory for the parcel version you want to use.

    To make a parcel available for distribution on your cluster:

    1. Verify the location of the local parcel repository on your Cloudera
    Manager server:

    Go to the Administration page, Properties tab, Parcels category.
    You can change the local repository path in the Local Parcel Repository
    Path property. By default it is /opt/cloudera/parcel-repo.

    2. Go to Cloudera's parcel repository at
    http://archive.cloudera.com/cdh4/parcels/ or
    http://beta.cloudera.com/impala/parcels/.

    3. Go to the directory for the software version you want to make available
    to your cluster.

    4. Copy the .parcel file for your operating system: (el5 or el6 for Red Hat
    5 or 6, lucid or precise for Ubuntu and so on) and place it into the local
    parcel repository on your Cloudera Manager server.

    5. Open the manifest.json file in the same directory as the .parcel file
    you just copied.

    6. Find the section of the manifest that corresponds to the parcel you
    downloaded:

    For example, if you are running RHEL 6 and copied the parcel file
    CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel, then you would look for the section:
    {
    "parcelName": "CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel",
    "components": [
    { "name": "flume-ng",
    "version": "1.3.0-cdh4.2.0",
    "pkg_version": "1.3.0+86"
    }
    ,{ "name": "mr1",
    "version": "2.0.0-mr1-cdh4.2.0",
    "pkg_version": "0.20.2+1341"
    }
    ,{ "name": "hadoop-hdfs",
    "version": "2.0.0-cdh4.2.0",
    "pkg_version": "2.0.0+922"
    }
    . . . . <snip> . . .

    ,{ "name": "whirr",
    "version": "0.8.0-cdh4.2.0",
    "pkg_version": "0.8.0+21"
    }
    ,{ "name": "zookeeper",
    "version": "3.4.5-cdh4.2.0",
    "pkg_version": "3.4.5+14"
    }
    ],
    "hash": "f1a08b5f7aeef6335d577c5f6fad0bca55f0c2d9"
    },

    7. Create a text file whose name is <parcel file name>.sha
    (e.g. CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel.sha) and copy the hash code
    into it:
    e.g.
    # cat > CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel.sha
    f1a08b5f7aeef6335d577c5f6fad0bca55f0c2d9
    ^D

    8. Place this file into your local parcel repository.


    Once these files are in place, Cloudera Manager will pick up the parcel and
    it will appear on the Hosts > Parcels page.
    *Note:* how quickly this occurs depends on the Parcel Update Frequency
    setting, set by default to 1 hour. You can change this on the
    Administration page, Properties tab, Parcels category.


    Cheers,
    Vikas
  • Anil Kumar at Mar 18, 2013 at 10:44 am
    Hi vikas,

    Thank you for your valuable solutions.

    i have done what you mentioned in the above post.

    i am getting below error:

    Invalid local parcel repository path configured: http://10.219.197.9/cdh/4/

    in this local repository, i have

    CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel 18-Mar-2013 12:28 664M
    [ ] CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel.sha 18-Mar-2013 12:35 41
    [DIR] repodata/

    but this is not taking as parcel repository.

    I stuck over here , Any suggestions on this.


    Thnaks,
  • Vikas Singh at Mar 18, 2013 at 4:12 pm
    Hi Anil,

    http://10.219.197.9/cdh/4/ is not a correct local repo path (it has to be a
    directory on the filesystem where CM server is running, default is
    '/opt/cloudera/parcel-repo')

    Please let us know where you have specified http://10.219.197.9/cdh/4/ as
    the path. Specifying this is the first step in the above list of steps
    (copied again below). Also, if this wasn't clear that it need to be path on
    local filesystem, then let us know, we need to update the help text for the
    field.

    1. Verify the location of the local parcel repository on your Cloudera
    Manager server:

    Go to the Administration page, Properties tab, Parcels category.
    You can change the local repository path in the Local Parcel Repository
    Path property. By default it is /opt/cloudera/parcel-repo.



    On Mon, Mar 18, 2013 at 3:44 AM, Anil Kumar wrote:

    Hi vikas,

    Thank you for your valuable solutions.

    i have done what you mentioned in the above post.

    i am getting below error:

    Invalid local parcel repository path configured:
    http://10.219.197.9/cdh/4/

    in this local repository, i have

    CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel 18-Mar-2013 12:28 664M
    [ ] CDH-4.2.0-1.cdh4.2.0.p0.10-el6.parcel.sha 18-Mar-2013 12:35 41
    [DIR] repodata/

    but this is not taking as parcel repository.

    I stuck over here , Any suggestions on this.


    Thnaks,


  • Anil Kumar at Mar 20, 2013 at 11:12 am
    Hi Vikas,

    Thank you very much vikas and wong,

    i installed cdh4 and impala using CM with your support.

    i have 2 node cluster(host1,host2) and 1 CM server(host3) , i will add
    one more node in my cluster later.

    in host1 i have 10 roles( DN,NN,SNN,GW,Beeswax,Hue,Impala
    daemon,JT,TT,Oozie).

    in host2 i have 6 roles (imapala daemon, impal statestore daemon,hive
    metastore server,TT,DN,Gateway)


    I thought hive and impala will use one metastore.

    I installed hive metastore as mysql in host2.

    From my java client program how can i connect to impala and execute sample
    query .

    Can you please explain it.

    Once again thanks a lot.

    Thanks,
  • Vikas Singh at Mar 20, 2013 at 4:01 pm
    Adding impala-user.

    As you have just installed, I am assuming you have Impala 0.6. If you want
    to use Java code for querying, you need to use JDBC driver released by
    Cloudera. Instructions are here:
    https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+to+Work+with+JDBC

    But before moving forward, please make sure that Hive is setup correctly.
    Are you able to execute hive queries in this setup? Once Hive is setup,
    next step will be to start "impala-shell" and execute same query using
    that. Once that is working, next logical step will be to write Java client
    (with the knowledge that any issue that you see is then related to Java
    client and not Impala/Hive setup)

    Let us know if you need any help with this process. You should be able to
    find documentation related to all these on Cloudera site.

    Vikas
  • Anil Kumar at Mar 21, 2013 at 10:39 am
    Hi Vikas,


    Thanks a lot , we fallowed just your steps , we are succeeded. :)

    But we are facing performance problem in impala.

    When you execute query in hive it is taking around 11 seconds

    *Time taken: 10.969 seconds
    hive>*

    But when you execute same query in impala it is taking around 15 seconds

    *Returned 391435 row(s) in 14.17s
    [10.219.197.8:21000] >
    *
    We checked all datanodes and all impalad running on all nodes(3 nodes)

    Can you please tell me , how can i get better preformance?

    For performance issue only we came to use clodera impala instead of
    hive,but we are not seeing expected performance.Please help us here.

    I am attaching sample log below when you execute query in impala.

    I0321 15:56:28.428236 30335 impala-beeswax-server.cc:138] query():
    query=select * from bidemo.product
    I0321 15:56:28.428500 30335 impala-beeswax-server.cc:429] query: Query {
    01: query (string) = "select * from bidemo.product",
    03: configuration (list) = list[1] {
    [0] = "",
    },
    04: hadoop_user (string) = "admin",
    }
    I0321 15:56:28.428611 30335 impala-beeswax-server.cc:444]
    TClientRequest.queryOptions: TQueryOptions {
    01: abort_on_error (bool) = false,
    02: max_errors (i32) = 0,
    03: disable_codegen (bool) = false,
    04: batch_size (i32) = 0,
    05: return_as_ascii (bool) = true,
    06: num_nodes (i32) = 0,
    07: max_scan_range_length (i64) = 0,
    08: num_scanner_threads (i32) = 0,
    09: max_io_buffers (i32) = 0,
    10: allow_unsupported_formats (bool) = false,
    11: default_order_by_limit (i64) = -1,
    12: debug_action (string) = "",
    }
    INFO0321 15:56:28.428000 Thread-31 com.cloudera.impala.service.Frontend]
    analyze query select * from bidemo.product
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.service.Frontend]
    create plan
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    create single-node plan
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    create plan fragments
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    finalize plan fragments
    INFO0321 15:56:28.601000 Thread-31
    com.cloudera.impala.planner.HdfsScanNode] collecting partitions for table
    product
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.service.Frontend]
    get scan range locations
    INFO0321 15:56:28.606000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded partiton PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=145}
    INFO0321 15:56:28.627000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded disk ids for PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=145}
    INFO0321 15:56:28.628000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    block metadata cache: CacheStats{hitCount=12, missCount=9,
    loadSuccessCount=9, loadExceptionCount=0, totalLoadTime=540830224,
    evictionCount=3}
    INFO0321 15:56:28.628000 Thread-31 com.cloudera.impala.service.Frontend]
    create result set metadata
    INFO0321 15:56:28.628000 Thread-31 com.cloudera.impala.service.JniFrontend]
    Plan Fragment 0
    UNPARTITIONED
    EXCHANGE (1)
    TUPLE IDS: 0
    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED
    * SCAN HDFS table=bidemo.product #partitions=1 size=6.84KB (0)
    * TUPLE IDS: 0
    I0321 15:56:28.630588 30335 coordinator.cc:285] Exec()
    query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.630827 30335 simple-scheduler.cc:168] SimpleScheduler
    assignment (data->backend): (10.219.197.9:50010 -> 10.219.197.9:22000),
    (10.219.197.10:50010 -> 10.219.197.10:22000), (10.219.197.8:50010 ->
    10.219.197.8:22000)
    I0321 15:56:28.630874 30335 simple-scheduler.cc:171] *SimpleScheduler
    locality percentage 100% (3 out of 3)
    *I0321 15:56:28.631031 30335 plan-fragment-executor.cc:80] Prepare():
    query_id=677ba7c609d84684:b56055e7e443ea77
    instance_id=677ba7c609d84684:b56055e7e443ea78
    I0321 15:56:28.641679 30335 plan-fragment-executor.cc:93] descriptor table
    for fragment=677ba7c609d84684:b56055e7e443ea78
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=INT col=0 offset=4 null=(offset=0
    mask=1)), Slot(id=1 type=STRING col=1 offset=8 null=(offset=0 mask=2)),
    Slot(id=2 type=STRING col=2 offset=24 null=(offset=0 mask=4)), Slot(id=3
    type=STRING col=3 offset=40 null=(offset=0 mask=8)), Slot(id=4 type=STRING
    col=4 offset=56 null=(offset=0 mask=10)), Slot(id=5 type=STRING col=5
    offset=72 null=(offset=0 mask=20)), Slot(id=6 type=STRING col=6 offset=88
    null=(offset=0 mask=40)), Slot(id=7 type=STRING col=7 offset=104
    null=(offset=0 mask=80)), Slot(id=8 type=STRING col=8 offset=120
    null=(offset=1 mask=1)), Slot(id=9 type=STRING col=9 offset=136
    null=(offset=1 mask=2)), Slot(id=10 type=STRING col=10 offset=152
    null=(offset=1 mask=4))])
    I0321 15:56:28.677536 30335 coordinator.cc:377] starting 3 backends for
    query 677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.677988 30504 impala-server.cc:1327] ExecPlanFragment()
    instance_id=677ba7c609d84684:b56055e7e443ea7a coord=10.219.197.8:22000
    backend#=1
    I0321 15:56:28.678056 30504 plan-fragment-executor.cc:80] Prepare():
    query_id=677ba7c609d84684:b56055e7e443ea77
    instance_id=677ba7c609d84684:b56055e7e443ea7a
    I0321 15:56:28.681727 30504 plan-fragment-executor.cc:93] descriptor table
    for fragment=677ba7c609d84684:b56055e7e443ea7a
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=INT col=0 offset=4 null=(offset=0
    mask=1)), Slot(id=1 type=STRING col=1 offset=8 null=(offset=0 mask=2)),
    Slot(id=2 type=STRING col=2 offset=24 null=(offset=0 mask=4)), Slot(id=3
    type=STRING col=3 offset=40 null=(offset=0 mask=8)), Slot(id=4 type=STRING
    col=4 offset=56 null=(offset=0 mask=10)), Slot(id=5 type=STRING col=5
    offset=72 null=(offset=0 mask=20)), Slot(id=6 type=STRING col=6 offset=88
    null=(offset=0 mask=40)), Slot(id=7 type=STRING col=7 offset=104
    null=(offset=0 mask=80)), Slot(id=8 type=STRING col=8 offset=120
    null=(offset=1 mask=1)), Slot(id=9 type=STRING col=9 offset=136
    null=(offset=1 mask=2)), Slot(id=10 type=STRING col=10 offset=152
    null=(offset=1 mask=4))])
    I0321 15:56:28.749954 30504 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.8:22000
    I0321 15:56:28.750207 6265 plan-fragment-executor.cc:207] Open():
    instance_id=677ba7c609d84684:b56055e7e443ea7a
    I0321 15:56:28.759348 4092 coordinator.cc:1003] Backend 2 completed, 2
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.759418 4092 coordinator.cc:1012]
    query_id=677ba7c609d84684:b56055e7e443ea77: first in-progress backend:
    10.219.197.10:22000
    I0321 15:56:28.762701 30481 coordinator.cc:1003] Backend 0 completed, 1
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.762779 30481 coordinator.cc:1012]
    query_id=677ba7c609d84684:b56055e7e443ea77: first in-progress backend:
    10.219.197.8:22000
    I0321 15:56:28.763306 6270 plan-fragment-executor.cc:207] Open():
    instance_id=677ba7c609d84684:b56055e7e443ea78
    I0321 15:56:28.786658 30504 progress-updater.cc:45] Query
    677ba7c609d84684:b56055e7e443ea77 100% Complete (1 out of 1)
    I0321 15:56:28.786799 30504 coordinator.cc:1003] Backend 1 completed, 0
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:29.012508 30335 impala-beeswax-server.cc:272]
    get_results_metadata(): query_id=677ba7c609d84684:b56055e7e443ea77

    Thanks,
    Anil
  • Udai Kiran Potluri at Mar 21, 2013 at 11:56 am
    Anil,

    When you go to http://<host-name>:25000/queries. You will see your query,
    can you paste the profile from there? Also the complete logs might be
    helpful.

    Thanks,
    Udai

    On Thu, Mar 21, 2013 at 4:09 PM, Anil Kumar wrote:

    Hi Vikas,


    Thanks a lot , we fallowed just your steps , we are succeeded. :)

    But we are facing performance problem in impala.

    When you execute query in hive it is taking around 11 seconds

    *Time taken: 10.969 seconds
    hive>*

    But when you execute same query in impala it is taking around 15 seconds

    *Returned 391435 row(s) in 14.17s
    [10.219.197.8:21000] >
    *
    We checked all datanodes and all impalad running on all nodes(3 nodes)

    Can you please tell me , how can i get better preformance?

    For performance issue only we came to use clodera impala instead of
    hive,but we are not seeing expected performance.Please help us here.

    I am attaching sample log below when you execute query in impala.

    I0321 15:56:28.428236 30335 impala-beeswax-server.cc:138] query():
    query=select * from bidemo.product
    I0321 15:56:28.428500 30335 impala-beeswax-server.cc:429] query: Query {
    01: query (string) = "select * from bidemo.product",
    03: configuration (list) = list[1] {
    [0] = "",
    },
    04: hadoop_user (string) = "admin",
    }
    I0321 15:56:28.428611 30335 impala-beeswax-server.cc:444]
    TClientRequest.queryOptions: TQueryOptions {
    01: abort_on_error (bool) = false,
    02: max_errors (i32) = 0,
    03: disable_codegen (bool) = false,
    04: batch_size (i32) = 0,
    05: return_as_ascii (bool) = true,
    06: num_nodes (i32) = 0,
    07: max_scan_range_length (i64) = 0,
    08: num_scanner_threads (i32) = 0,
    09: max_io_buffers (i32) = 0,
    10: allow_unsupported_formats (bool) = false,
    11: default_order_by_limit (i64) = -1,
    12: debug_action (string) = "",
    }
    INFO0321 15:56:28.428000 Thread-31 com.cloudera.impala.service.Frontend]
    analyze query select * from bidemo.product
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.service.Frontend]
    create plan
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    create single-node plan
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    create plan fragments
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.planner.Planner]
    finalize plan fragments
    INFO0321 15:56:28.601000 Thread-31
    com.cloudera.impala.planner.HdfsScanNode] collecting partitions for table
    product
    INFO0321 15:56:28.601000 Thread-31 com.cloudera.impala.service.Frontend]
    get scan range locations
    INFO0321 15:56:28.606000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded partiton PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=145}
    INFO0321 15:56:28.627000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded disk ids for PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=145}
    INFO0321 15:56:28.628000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    block metadata cache: CacheStats{hitCount=12, missCount=9,
    loadSuccessCount=9, loadExceptionCount=0, totalLoadTime=540830224,
    evictionCount=3}
    INFO0321 15:56:28.628000 Thread-31 com.cloudera.impala.service.Frontend]
    create result set metadata
    INFO0321 15:56:28.628000 Thread-31
    com.cloudera.impala.service.JniFrontend] Plan Fragment 0
    UNPARTITIONED
    EXCHANGE (1)
    TUPLE IDS: 0
    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED
    * SCAN HDFS table=bidemo.product #partitions=1 size=6.84KB (0)
    * TUPLE IDS: 0
    I0321 15:56:28.630588 30335 coordinator.cc:285] Exec()
    query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.630827 30335 simple-scheduler.cc:168] SimpleScheduler
    assignment (data->backend): (10.219.197.9:50010 -> 10.219.197.9:22000), (
    10.219.197.10:50010 -> 10.219.197.10:22000), (10.219.197.8:50010 ->
    10.219.197.8:22000)
    I0321 15:56:28.630874 30335 simple-scheduler.cc:171] *SimpleScheduler
    locality percentage 100% (3 out of 3)
    *I0321 15:56:28.631031 30335 plan-fragment-executor.cc:80] Prepare():
    query_id=677ba7c609d84684:b56055e7e443ea77
    instance_id=677ba7c609d84684:b56055e7e443ea78
    I0321 15:56:28.641679 30335 plan-fragment-executor.cc:93] descriptor table
    for fragment=677ba7c609d84684:b56055e7e443ea78
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=INT col=0 offset=4
    null=(offset=0 mask=1)), Slot(id=1 type=STRING col=1 offset=8
    null=(offset=0 mask=2)), Slot(id=2 type=STRING col=2 offset=24
    null=(offset=0 mask=4)), Slot(id=3 type=STRING col=3 offset=40
    null=(offset=0 mask=8)), Slot(id=4 type=STRING col=4 offset=56
    null=(offset=0 mask=10)), Slot(id=5 type=STRING col=5 offset=72
    null=(offset=0 mask=20)), Slot(id=6 type=STRING col=6 offset=88
    null=(offset=0 mask=40)), Slot(id=7 type=STRING col=7 offset=104
    null=(offset=0 mask=80)), Slot(id=8 type=STRING col=8 offset=120
    null=(offset=1 mask=1)), Slot(id=9 type=STRING col=9 offset=136
    null=(offset=1 mask=2)), Slot(id=10 type=STRING col=10 offset=152
    null=(offset=1 mask=4))])
    I0321 15:56:28.677536 30335 coordinator.cc:377] starting 3 backends for
    query 677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.677988 30504 impala-server.cc:1327] ExecPlanFragment()
    instance_id=677ba7c609d84684:b56055e7e443ea7a coord=10.219.197.8:22000backend#=1
    I0321 15:56:28.678056 30504 plan-fragment-executor.cc:80] Prepare():
    query_id=677ba7c609d84684:b56055e7e443ea77
    instance_id=677ba7c609d84684:b56055e7e443ea7a
    I0321 15:56:28.681727 30504 plan-fragment-executor.cc:93] descriptor table
    for fragment=677ba7c609d84684:b56055e7e443ea7a
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=INT col=0 offset=4
    null=(offset=0 mask=1)), Slot(id=1 type=STRING col=1 offset=8
    null=(offset=0 mask=2)), Slot(id=2 type=STRING col=2 offset=24
    null=(offset=0 mask=4)), Slot(id=3 type=STRING col=3 offset=40
    null=(offset=0 mask=8)), Slot(id=4 type=STRING col=4 offset=56
    null=(offset=0 mask=10)), Slot(id=5 type=STRING col=5 offset=72
    null=(offset=0 mask=20)), Slot(id=6 type=STRING col=6 offset=88
    null=(offset=0 mask=40)), Slot(id=7 type=STRING col=7 offset=104
    null=(offset=0 mask=80)), Slot(id=8 type=STRING col=8 offset=120
    null=(offset=1 mask=1)), Slot(id=9 type=STRING col=9 offset=136
    null=(offset=1 mask=2)), Slot(id=10 type=STRING col=10 offset=152
    null=(offset=1 mask=4))])
    I0321 15:56:28.749954 30504 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.8:22000
    I0321 15:56:28.750207 6265 plan-fragment-executor.cc:207] Open():
    instance_id=677ba7c609d84684:b56055e7e443ea7a
    I0321 15:56:28.759348 4092 coordinator.cc:1003] Backend 2 completed, 2
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.759418 4092 coordinator.cc:1012]
    query_id=677ba7c609d84684:b56055e7e443ea77: first in-progress backend:
    10.219.197.10:22000
    I0321 15:56:28.762701 30481 coordinator.cc:1003] Backend 0 completed, 1
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:28.762779 30481 coordinator.cc:1012]
    query_id=677ba7c609d84684:b56055e7e443ea77: first in-progress backend:
    10.219.197.8:22000
    I0321 15:56:28.763306 6270 plan-fragment-executor.cc:207] Open():
    instance_id=677ba7c609d84684:b56055e7e443ea78
    I0321 15:56:28.786658 30504 progress-updater.cc:45] Query
    677ba7c609d84684:b56055e7e443ea77 100% Complete (1 out of 1)
    I0321 15:56:28.786799 30504 coordinator.cc:1003] Backend 1 completed, 0
    remaining: query_id=677ba7c609d84684:b56055e7e443ea77
    I0321 15:56:29.012508 30335 impala-beeswax-server.cc:272]
    get_results_metadata(): query_id=677ba7c609d84684:b56055e7e443ea77

    Thanks,
    Anil
  • Anil Kumar at Mar 22, 2013 at 3:18 am
    Hi Udai,

    http://10.219.197.8:25000/queries.

    In this page I have seen the result like below.

    6983882f9f5a47b4:9545125340d8f7bc | select * from bidemo.sales | QUERY
    2013-03-22 08:22:47 | 0 / 1 ( 0%) | FINISHED | 100
    *Query Profile: *

    Query (id=6983882f9f5a47b4:9545125340d8f7bc):
    - PlanningTime: 77ms
    Query 6983882f9f5a47b4:9545125340d8f7bc:(605ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(65ms 0.00%)
    - RowsProduced: 1.02K
    CodeGen:
    - CodegenTime: 0K clock cycles
    - CompileTime: 104ms
    - LoadTime: 197ms
    - ModuleFileSize: 37.04 KB
    EXCHANGE_NODE (id=1):(65ms 0.00%)
    - BytesReceived: 1.25 MB
    - ConvertRowBatchTime: 34K clock cycles
    - DeserializeRowBatchTimer: 2ms
    - MemoryUsed: 0.00
    - RowsReturned: 1.02K
    - RowsReturnedRate: 15.74 K/sec
    Averaged Fragment 1:
    split sizes: min: 0.00 , max: 19.25 MB, avg: 6.42 MB, stddev: 9.07 MB
    Fragment 1:

    *Corresponding Logs : *
    **
    * *I0322 08:22:47.914163 30335 impala-beeswax-server.cc:138] query():
    query=select * from bidemo.sales
    I0322 08:22:47.931503 30335 impala-beeswax-server.cc:429] query: Query {
    01: query (string) = "select * from bidemo.sales",
    03: configuration (list) = list[1] {
    [0] = "",
    },
    04: hadoop_user (string) = "admin",
    }
    I0322 08:22:47.931627 30335 impala-beeswax-server.cc:444]
    TClientRequest.queryOptions: TQueryOptions {
    01: abort_on_error (bool) = false,
    02: max_errors (i32) = 0,
    03: disable_codegen (bool) = false,
    04: batch_size (i32) = 0,
    05: return_as_ascii (bool) = true,
    06: num_nodes (i32) = 0,
    07: max_scan_range_length (i64) = 0,
    08: num_scanner_threads (i32) = 0,
    09: max_io_buffers (i32) = 0,
    10: allow_unsupported_formats (bool) = false,
    11: default_order_by_limit (i64) = -1,
    12: debug_action (string) = "",
    }
    INFO0322 08:22:47.932000 Thread-31 com.cloudera.impala.service.Frontend]
    analyze query select * from bidemo.sales
    INFO0322 08:22:47.959000 Thread-31 com.cloudera.impala.service.Frontend]
    create plan
    INFO0322 08:22:47.959000 Thread-31 com.cloudera.impala.planner.Planner]
    create single-node plan
    INFO0322 08:22:47.959000 Thread-31 com.cloudera.impala.planner.Planner]
    create plan fragments
    INFO0322 08:22:47.960000 Thread-31 com.cloudera.impala.planner.Planner]
    finalize plan fragments
    INFO0322 08:22:47.960000 Thread-31
    com.cloudera.impala.planner.HdfsScanNode] collecting partitions for table
    sales
    INFO0322 08:22:47.961000 Thread-31 com.cloudera.impala.service.Frontend]
    get scan range locations
    INFO0322 08:22:47.981000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded partiton PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=86}
    INFO0322 08:22:48.004000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    loaded disk ids for PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=86}
    INFO0322 08:22:48.005000 Thread-31 com.cloudera.impala.catalog.HdfsTable]
    block metadata cache: CacheStats{hitCount=13, missCount=10,
    loadSuccessCount=10, loadExceptionCount=0, totalLoadTime=583890510,
    evictionCount=4}
    INFO0322 08:22:48.006000 Thread-31 com.cloudera.impala.service.Frontend]
    create result set metadata
    INFO0322 08:22:48.006000 Thread-31 com.cloudera.impala.service.JniFrontend]
    Plan Fragment 0
    UNPARTITIONED
    EXCHANGE (1)
    TUPLE IDS: 0
    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED
    SCAN HDFS table=bidemo.sales #partitions=1 size=19.25MB (0)
    TUPLE IDS: 0
    I0322 08:22:48.009865 30335 coordinator.cc:285] Exec()
    query_id=6983882f9f5a47b4:9545125340d8f7bc
    I0322 08:22:48.010149 30335 simple-scheduler.cc:168] SimpleScheduler
    assignment (data->backend): (10.219.197.9:50010 -> 10.219.197.9:22000),
    (10.219.197.10:50010 -> 10.219.197.10:22000), (10.219.197.8:50010 ->
    10.219.197.8:22000)
    I0322 08:22:48.010196 30335 simple-scheduler.cc:171] SimpleScheduler
    locality percentage 100% (3 out of 3)
    I0322 08:22:48.010329 30335 plan-fragment-executor.cc:80] Prepare():
    query_id=6983882f9f5a47b4:9545125340d8f7bc
    instance_id=6983882f9f5a47b4:9545125340d8f7bd
    I0322 08:22:48.234875 30335 plan-fragment-executor.cc:93] descriptor table
    for fragment=6983882f9f5a47b4:9545125340d8f7bd
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=STRING col=0 offset=56
    null=(offset=1 mask=10)), Slot(id=1 type=STRING col=1 offset=72
    null=(offset=1 mask=20)), Slot(id=2 type=STRING col=2 offset=88
    null=(offset=1 mask=40)), Slot(id=3 type=STRING col=3 offset=104
    null=(offset=1 mask=80)), Slot(id=4 type=STRING col=4 offset=120
    null=(offset=2 mask=1)), Slot(id=5 type=STRING col=5 offset=136
    null=(offset=2 mask=2)), Slot(id=6 type=STRING col=6 offset=152
    null=(offset=2 mask=4)), Slot(id=7 type=INT col=7 offset=4 null=(offset=0
    mask=1)), Slot(id=8 type=INT col=8 offset=8 null=(offset=0 mask=2)),
    Slot(id=9 type=INT col=9 offset=12 null=(offset=0 mask=4)), Slot(id=10
    type=INT col=10 offset=16 null=(offset=0 mask=8)), Slot(id=11 type=INT
    col=11 offset=20 null=(offset=0 mask=10)), Slot(id=12 type=INT col=12
    offset=24 null=(offset=0 mask=20)), Slot(id=13 type=INT col=13 offset=28
    null=(offset=0 mask=40)), Slot(id=14 type=INT col=14 offset=32
    null=(offset=0 mask=80)), Slot(id=15 type=INT col=15 offset=36
    null=(offset=1 mask=1)), Slot(id=16 type=INT col=16 offset=40
    null=(offset=1 mask=2)), Slot(id=17 type=INT col=17 offset=44
    null=(offset=1 mask=4)), Slot(id=18 type=INT col=18 offset=48
    null=(offset=1 mask=8))])
    I0322 08:22:48.339468 30335 coordinator.cc:377] starting 3 backends for
    query 6983882f9f5a47b4:9545125340d8f7bc
    I0322 08:22:48.339807 6264 impala-server.cc:1327] ExecPlanFragment()
    instance_id=6983882f9f5a47b4:9545125340d8f7bf coord=10.219.197.8:22000
    backend#=1
    I0322 08:22:48.339838 6264 plan-fragment-executor.cc:80] Prepare():
    query_id=6983882f9f5a47b4:9545125340d8f7bc
    instance_id=6983882f9f5a47b4:9545125340d8f7bf
    I0322 08:22:48.343637 6264 plan-fragment-executor.cc:93] descriptor table
    for fragment=6983882f9f5a47b4:9545125340d8f7bf
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=STRING col=0 offset=56
    null=(offset=1 mask=10)), Slot(id=1 type=STRING col=1 offset=72
    null=(offset=1 mask=20)), Slot(id=2 type=STRING col=2 offset=88
    null=(offset=1 mask=40)), Slot(id=3 type=STRING col=3 offset=104
    null=(offset=1 mask=80)), Slot(id=4 type=STRING col=4 offset=120
    null=(offset=2 mask=1)), Slot(id=5 type=STRING col=5 offset=136
    null=(offset=2 mask=2)), Slot(id=6 type=STRING col=6 offset=152
    null=(offset=2 mask=4)), Slot(id=7 type=INT col=7 offset=4 null=(offset=0
    mask=1)), Slot(id=8 type=INT col=8 offset=8 null=(offset=0 mask=2)),
    Slot(id=9 type=INT col=9 offset=12 null=(offset=0 mask=4)), Slot(id=10
    type=INT col=10 offset=16 null=(offset=0 mask=8)), Slot(id=11 type=INT
    col=11 offset=20 null=(offset=0 mask=10)), Slot(id=12 type=INT col=12
    offset=24 null=(offset=0 mask=20)), Slot(id=13 type=INT col=13 offset=28
    null=(offset=0 mask=40)), Slot(id=14 type=INT col=14 offset=32
    null=(offset=0 mask=80)), Slot(id=15 type=INT col=15 offset=36
    null=(offset=1 mask=1)), Slot(id=16 type=INT col=16 offset=40
    null=(offset=1 mask=2)), Slot(id=17 type=INT col=17 offset=44
    null=(offset=1 mask=4)), Slot(id=18 type=INT col=18 offset=48
    null=(offset=1 mask=8))])
    I0322 08:22:48.551887 4092 coordinator.cc:1003] Backend 2 completed, 2
    remaining: query_id=6983882f9f5a47b4:9545125340d8f7bc
    I0322 08:22:48.552000 4092 coordinator.cc:1012]
    query_id=6983882f9f5a47b4:9545125340d8f7bc: first in-progress backend:
    10.219.197.8:22000
    I0322 08:22:48.552098 12795 plan-fragment-executor.cc:207] Open():
    instance_id=6983882f9f5a47b4:9545125340d8f7bd
    I0322 08:22:48.552136 30481 coordinator.cc:1003] Backend 0 completed, 1
    remaining: query_id=6983882f9f5a47b4:9545125340d8f7bc
    I0322 08:22:48.552253 30481 coordinator.cc:1012]
    query_id=6983882f9f5a47b4:9545125340d8f7bc: first in-progress backend:
    10.219.197.8:22000
    I0322 08:22:48.552397 12794 plan-fragment-executor.cc:207] Open():
    instance_id=6983882f9f5a47b4:9545125340d8f7bf
    I0322 08:22:49.116104 30335 impala-beeswax-server.cc:272]
    get_results_metadata(): query_id=6983882f9f5a47b4:9545125340d8f7bc
    I0322 08:22:51.563602 4252 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.8:22000.

    Could you please tell me how to increase the performence of impala.

    Thanks,
    Anil.
  • Anil Kumar at Mar 22, 2013 at 3:54 am
    Hi udai,

    These are the latest logs i am providing .

    82a22caa76ed4ace:8b95a9e078a99be3 select * from
    bidemo.sales QUERY 2013-03-22 09:17:42 0 / 1 ( 0%) FINISHED 100

    *Query Profile : *
    **
    Query (id=82a22caa76ed4ace:8b95a9e078a99be3):
    - PlanningTime: 3s369ms
    Query 82a22caa76ed4ace:8b95a9e078a99be3:(786ms 0.00%)
    Aggregate Profile:
    Coordinator Fragment:(6ms 0.00%)
    - RowsProduced: 1.02K
    CodeGen:
    - CodegenTime: 0K clock cycles
    - CompileTime: 124ms
    - LoadTime: 12ms
    - ModuleFileSize: 37.04 KB
    EXCHANGE_NODE (id=1):(6ms 0.00%)
    - BytesReceived: 1.25 MB
    - ConvertRowBatchTime: 32K clock cycles
    - DeserializeRowBatchTimer: 2ms
    - MemoryUsed: 0.00
    - RowsReturned: 1.02K
    - RowsReturnedRate: 167.71 K/sec
    Averaged Fragment 1:
    split sizes: min: 0.00 , max: 19.25 MB, avg: 6.42 MB, stddev: 9.07 MB
    Fragment 1:

    *Complete Logs After restarted all Impala deomons and statestore* .

    Log file created at: 2013/03/22 09:06:04
    Running on machine: HYDPCM308106D.ad.xxx.com
    Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
    I0322 09:06:04.548318 13093 daemon.cc:34] impalad version 0.6 RELEASE
    (build 720f93c4875ccc0ac8f1b55937bd4800f010db6a)
    Built on Sat, 23 Feb 2013 19:15:00 PST
    I0322 09:06:04.548877 13093 daemon.cc:35] Using hostname:
    hydpcm308106d.ad.xxx.com
    I0322 09:06:04.549726 13093 logging.cc:76] Flags (see also /varz are on
    debug webserver):
    --dump_ir=false
    --module_output=
    --be_port=22000
    --classpath=
    --hostname=hydpcm308106d.ad.xxx.com
    --ipaddress=10.219.197.8
    --keytab_file=
    --planservice_host=localhost
    --planservice_port=20000
    --principal=
    --max_row_batches=0
    --randomize_scan_ranges=false
    --num_disks=0
    --num_threads_per_disk=1
    --read_size=8388608
    --enable_webserver=true
    --use_statestore=true
    --nn=hydpcm308106d.ad.xxx.com
    --nn_port=8020
    --serialize_batch=false
    --status_report_interval=5
    --abort_on_config_error=true
    --be_service_threads=64
    --beeswax_port=21000
    --default_query_options=
    --fe_service_threads=64
    --heap_profile_dir=
    --hs2_port=21050
    --load_catalog_at_startup=false
    --log_mem_usage_interval=0
    --mem_limit=-1
    --query_log_size=25
    --use_planservice=false
    --statestore_subscriber_timeout_seconds=10
    --state_store_host=hydpcm308859d.ad.xxx.com
    --state_store_port=24000
    --state_store_subscriber_port=23000
    --kerberos_reinit_interval=60
    --sasl_path=/usr/lib/sasl2:/usr/lib64/sasl2:/usr/local/lib/sasl2:/usr/lib/x86_64-linux-gnu/sasl2
    --web_log_bytes=1048576
    --log_filename=impalad
    --rpc_cnxn_attempts=10
    --rpc_cnxn_retry_interval_ms=2000
    --enable_webserver_doc_root=true
    --webserver_doc_root=/opt/cloudera/parcels/IMPALA-0.6-1.p0.109/lib/impala
    --webserver_interface=
    --webserver_port=25000
    --flagfile=/var/run/cloudera-scm-agent/process/157-impala-IMPALAD/impala-conf/impalad_flags
    --fromenv=
    --tryfromenv=
    --undefok=
    --tab_completion_columns=80
    --tab_completion_word=
    --help=false
    --helpfull=false
    --helpmatch=
    --helpon=
    --helppackage=false
    --helpshort=false
    --helpxml=false
    --version=false
    --alsologtoemail=
    --alsologtostderr=false
    --drop_log_memory=true
    --log_backtrace_at=
    --log_dir=/var/log/impalad
    --log_link=
    --log_prefix=true
    --logbuflevel=-1
    --logbufsecs=30
    --logemaillevel=999
    --logmailer=/bin/mail
    --logtostderr=false
    --max_log_size=200
    --minloglevel=0
    --stderrthreshold=2
    --stop_logging_if_full_disk=false
    --symbolize_stacktrace=true
    --v=1
    --vmodule=
    I0322 09:06:04.561517 13093 cpu-info.cc:121] Cpu Info:
    Cores: 2
    L1 Cache: 64.00 KB
    L2 Cache: 1024.00 KB
    L3 Cache: 0.00
    Hardware Supports:
    I0322 09:06:04.561869 13093 disk-info.cc:95] Disk Info:
    Num disks 1: sda
    I0322 09:06:08.674049 13093 impala-server.cc:1477] Default query
    options:TQueryOptions {
    01: abort_on_error (bool) = false,
    02: max_errors (i32) = 0,
    03: disable_codegen (bool) = false,
    04: batch_size (i32) = 0,
    05: return_as_ascii (bool) = true,
    06: num_nodes (i32) = 0,
    07: max_scan_range_length (i64) = 0,
    08: num_scanner_threads (i32) = 0,
    09: max_io_buffers (i32) = 0,
    10: allow_unsupported_formats (bool) = false,
    11: default_order_by_limit (i64) = -1,
    12: debug_action (string) = "",
    }
    WARN0322 09:06:08.825000 main org.apache.hadoop.conf.Configuration]
    mapred.max.split.size is deprecated. Instead, use
    mapreduce.input.fileinputformat.split.maxsize
    WARN0322 09:06:08.825000 main org.apache.hadoop.conf.Configuration]
    mapred.min.split.size is deprecated. Instead, use
    mapreduce.input.fileinputformat.split.minsize
    WARN0322 09:06:08.826000 main org.apache.hadoop.conf.Configuration]
    mapred.min.split.size.per.rack is deprecated. Instead, use
    mapreduce.input.fileinputformat.split.minsize.per.rack
    WARN0322 09:06:08.826000 main org.apache.hadoop.conf.Configuration]
    mapred.min.split.size.per.node is deprecated. Instead, use
    mapreduce.input.fileinputformat.split.minsize.per.node
    WARN0322 09:06:08.826000 main org.apache.hadoop.conf.Configuration]
    mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    WARN0322 09:06:08.826000 main org.apache.hadoop.conf.Configuration]
    mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
    mapreduce.reduce.speculative
    WARN0322 09:06:09.040000 main org.apache.hadoop.conf.Configuration]
    org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@57d437c:an attempt
    to override final parameter:
    mapreduce.job.end-notification.max.retry.interval; Ignoring.
    WARN0322 09:06:09.049000 main org.apache.hadoop.conf.Configuration]
    org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@57d437c:an attempt
    to override final parameter: mapreduce.job.end-notification.max.attempts;
    Ignoring.
    WARN0322 09:06:09.053000 main org.apache.hadoop.hive.conf.HiveConf]
    DEPRECATED: Configuration property hive.metastore.local no longer has any
    effect. Make sure to provide a valid value for hive.metastore.uris if you
    are connecting to a remote metastore.
    INFO0322 09:06:09.129000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308102d.ad.xxx.com:9083
    WARN0322 09:06:09.234000 main hive.metastore] Failed to connect to the
    MetaStore Server...
    INFO0322 09:06:09.234000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308859d.ad.xxx.com:9083
    INFO0322 09:06:09.323000 main hive.metastore] Waiting 1 seconds before next
    connection attempt.
    INFO0322 09:06:10.324000 main hive.metastore] Connected to metastore.
    INFO0322 09:06:10.325000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308102d.ad.xxx.com:9083
    WARN0322 09:06:10.326000 main hive.metastore] Failed to connect to the
    MetaStore Server...
    INFO0322 09:06:10.327000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308859d.ad.xxx.com:9083
    INFO0322 09:06:10.330000 main hive.metastore] Waiting 1 seconds before next
    connection attempt.
    INFO0322 09:06:11.330000 main hive.metastore] Connected to metastore.
    INFO0322 09:06:11.332000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308102d.ad.xxx.com:9083
    WARN0322 09:06:11.333000 main hive.metastore] Failed to connect to the
    MetaStore Server...
    INFO0322 09:06:11.333000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308859d.ad.xxx.com:9083
    INFO0322 09:06:11.336000 main hive.metastore] Waiting 1 seconds before next
    connection attempt.
    INFO0322 09:06:12.336000 main hive.metastore] Connected to metastore.
    INFO0322 09:06:12.338000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308102d.ad.xxx.com:9083
    WARN0322 09:06:12.339000 main hive.metastore] Failed to connect to the
    MetaStore Server...
    INFO0322 09:06:12.339000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308859d.ad.xxx.com:9083
    INFO0322 09:06:12.342000 main hive.metastore] Waiting 1 seconds before next
    connection attempt.
    INFO0322 09:06:13.343000 main hive.metastore] Connected to metastore.
    INFO0322 09:06:13.344000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308102d.ad.xxx.com:9083
    WARN0322 09:06:13.345000 main hive.metastore] Failed to connect to the
    MetaStore Server...
    INFO0322 09:06:13.345000 main hive.metastore] Trying to connect to
    metastore with URI thrift://hydpcm308859d.ad.xxx.com:9083
    INFO0322 09:06:13.348000 main hive.metastore] Waiting 1 seconds before next
    connection attempt.
    INFO0322 09:06:14.348000 main hive.metastore] Connected to metastore.
    I0322 09:06:14.428660 13093 impala-server.cc:1709] Impala Beeswax Service
    listening on 21000
    I0322 09:06:14.428704 13093 impala-server.cc:1720] Impala HiveServer2
    Service listening on 21050
    I0322 09:06:14.428714 13093 impala-server.cc:1728] ImpalaInternalService
    listening on 22000
    I0322 09:06:14.428984 13093 thrift-server.cc:341] ThriftServer
    'ImpalaServer Backend' started on port: 22000
    I0322 09:06:14.428997 13093 exec-env.cc:79] Starting global services
    I0322 09:06:14.429008 13093 webserver.cc:145] Starting webserver on all
    interfaces, port 25000
    I0322 09:06:14.429019 13093 webserver.cc:155] Document root:
    /opt/cloudera/parcels/IMPALA-0.6-1.p0.109/lib/impala
    I0322 09:06:14.429399 13093 webserver.cc:194] Webserver started
    I0322 09:06:14.429429 13093 subscription-manager.cc:73] Starting
    subscription manager
    I0322 09:06:14.429761 13093 thrift-server.cc:341] ThriftServer
    'StateStoreSubscriber' started on port: 23000
    I0322 09:06:14.429947 13093 state-store-subscriber.cc:234]
    StateStoreSubscriber listening on 23000
    I0322 09:06:14.430001 13093 simple-scheduler.cc:82] Starting simple
    scheduler
    I0322 09:06:14.441184 13093 state-store-subscriber.cc:129] Attempting to
    register subscriber for services impala_backend_service at
    10.219.197.8:23000
    I0322 09:06:14.441983 13093 state-store-subscriber.cc:77] Attempting to
    register service impala_backend_service on address 10.219.197.8:22000, to
    subscriber at 10.219.197.8:23000
    I0322 09:06:14.442802 13093 state-store-subscriber.cc:129] Attempting to
    register subscriber for services impala_backend_service at
    10.219.197.8:23000
    I0322 09:06:14.447737 13093 thrift-server.cc:341] ThriftServer
    'ImpalaServer Beeswax Frontend' started on port: 21000
    I0322 09:06:14.451807 13093 thrift-server.cc:341] ThriftServer
    'ImpalaServer HiveServer2 Frontend' started on port: 21050
    I0322 09:06:14.451972 13093 impalad-main.cc:129] Impala has started.
    I0322 09:08:24.862784 13297 impala-server.cc:1327] ExecPlanFragment()
    instance_id=c42efdc6b34b4b0e:bd193598f6cc0596 coord=10.219.197.10:22000
    backend#=1
    I0322 09:08:24.862973 13297 plan-fragment-executor.cc:80] Prepare():
    query_id=c42efdc6b34b4b0e:bd193598f6cc0593
    instance_id=c42efdc6b34b4b0e:bd193598f6cc0596
    I0322 09:08:24.872961 13297 plan-fragment-executor.cc:93] descriptor table
    for fragment=c42efdc6b34b4b0e:bd193598f6cc0596
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=STRING col=0 offset=56
    null=(offset=1 mask=10)), Slot(id=1 type=STRING col=1 offset=72
    null=(offset=1 mask=20)), Slot(id=2 type=STRING col=2 offset=88
    null=(offset=1 mask=40)), Slot(id=3 type=STRING col=3 offset=104
    null=(offset=1 mask=80)), Slot(id=4 type=STRING col=4 offset=120
    null=(offset=2 mask=1)), Slot(id=5 type=STRING col=5 offset=136
    null=(offset=2 mask=2)), Slot(id=6 type=STRING col=6 offset=152
    null=(offset=2 mask=4)), Slot(id=7 type=INT col=7 offset=4 null=(offset=0
    mask=1)), Slot(id=8 type=INT col=8 offset=8 null=(offset=0 mask=2)),
    Slot(id=9 type=INT col=9 offset=12 null=(offset=0 mask=4)), Slot(id=10
    type=INT col=10 offset=16 null=(offset=0 mask=8)), Slot(id=11 type=INT
    col=11 offset=20 null=(offset=0 mask=10)), Slot(id=12 type=INT col=12
    offset=24 null=(offset=0 mask=20)), Slot(id=13 type=INT col=13 offset=28
    null=(offset=0 mask=40)), Slot(id=14 type=INT col=14 offset=32
    null=(offset=0 mask=80)), Slot(id=15 type=INT col=15 offset=36
    null=(offset=1 mask=1)), Slot(id=16 type=INT col=16 offset=40
    null=(offset=1 mask=2)), Slot(id=17 type=INT col=17 offset=44
    null=(offset=1 mask=4)), Slot(id=18 type=INT col=18 offset=48
    null=(offset=1 mask=8))])
    I0322 09:08:25.033979 13297 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.10:22000
    I0322 09:08:25.034325 13299 plan-fragment-executor.cc:207] Open():
    instance_id=c42efdc6b34b4b0e:bd193598f6cc0596

    I0322 09:17:42.586603 13157 impala-beeswax-server.cc:138] query():
    query=select * from bidemo.sales
    I0322 09:17:42.586851 13157 impala-beeswax-server.cc:429] query: Query {
    01: query (string) = "select * from bidemo.sales",
    03: configuration (list) = list[1] {
    [0] = "",
    },
    04: hadoop_user (string) = "admin",
    }
    I0322 09:17:42.586968 13157 impala-beeswax-server.cc:444]
    TClientRequest.queryOptions: TQueryOptions {
    01: abort_on_error (bool) = false,
    02: max_errors (i32) = 0,
    03: disable_codegen (bool) = false,
    04: batch_size (i32) = 0,
    05: return_as_ascii (bool) = true,
    06: num_nodes (i32) = 0,
    07: max_scan_range_length (i64) = 0,
    08: num_scanner_threads (i32) = 0,
    09: max_io_buffers (i32) = 0,
    10: allow_unsupported_formats (bool) = false,
    11: default_order_by_limit (i64) = -1,
    12: debug_action (string) = "",
    }
    INFO0322 09:17:42.631000 Thread-2 com.cloudera.impala.service.Frontend]
    analyze query select * from bidemo.sales
    INFO0322 09:17:43.242000 Thread-2 com.cloudera.impala.service.Frontend]
    create plan
    INFO0322 09:17:43.255000 Thread-2 com.cloudera.impala.planner.Planner]
    create single-node plan
    INFO0322 09:17:43.256000 Thread-2 com.cloudera.impala.planner.Planner]
    create plan fragments
    INFO0322 09:17:43.258000 Thread-2 com.cloudera.impala.planner.Planner]
    finalize plan fragments
    INFO0322 09:17:43.259000 Thread-2 com.cloudera.impala.planner.HdfsScanNode]
    collecting partitions for table sales
    INFO0322 09:17:43.345000 Thread-2 com.cloudera.impala.service.Frontend] get
    scan range locations
    INFO0322 09:17:43.385000 Thread-2 com.cloudera.impala.catalog.HdfsTable]
    loaded partiton PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=141}
    INFO0322 09:17:43.458000 Thread-2 com.cloudera.impala.catalog.HdfsTable]
    loaded disk ids for PartitionBlockMetadata{#blocks=1, #filenames=1,
    totalStringLen=141}
    INFO0322 09:17:43.459000 Thread-2 com.cloudera.impala.catalog.HdfsTable]
    block metadata cache: CacheStats{hitCount=0, missCount=1,
    loadSuccessCount=1, loadExceptionCount=0, totalLoadTime=111916952,
    evictionCount=0}
    INFO0322 09:17:43.493000 Thread-2 com.cloudera.impala.service.Frontend]
    create result set metadata
    INFO0322 09:17:43.498000 Thread-2 com.cloudera.impala.service.JniFrontend]
    Plan Fragment 0
    UNPARTITIONED
    EXCHANGE (1)
    TUPLE IDS: 0
    Plan Fragment 1
    RANDOM
    STREAM DATA SINK
    EXCHANGE ID: 1
    UNPARTITIONED
    SCAN HDFS table=bidemo.sales #partitions=1 size=19.25MB (0)
    TUPLE IDS: 0
    I0322 09:17:43.518561 13157 coordinator.cc:285] Exec()
    query_id=82a22caa76ed4ace:8b95a9e078a99be3
    I0322 09:17:43.518682 13157 simple-scheduler.cc:168] SimpleScheduler
    assignment (data->backend): (10.219.197.9:50010 -> 10.219.197.9:22000),
    (10.219.197.10:50010 -> 10.219.197.10:22000), (10.219.197.8:50010 ->
    10.219.197.8:22000)
    I0322 09:17:43.518693 13157 simple-scheduler.cc:171] SimpleScheduler
    locality percentage 100% (3 out of 3)
    I0322 09:17:43.518748 13157 plan-fragment-executor.cc:80] Prepare():
    query_id=82a22caa76ed4ace:8b95a9e078a99be3
    instance_id=82a22caa76ed4ace:8b95a9e078a99be4
    I0322 09:17:43.522248 13157 plan-fragment-executor.cc:93] descriptor table
    for fragment=82a22caa76ed4ace:8b95a9e078a99be4
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=STRING col=0 offset=56
    null=(offset=1 mask=10)), Slot(id=1 type=STRING col=1 offset=72
    null=(offset=1 mask=20)), Slot(id=2 type=STRING col=2 offset=88
    null=(offset=1 mask=40)), Slot(id=3 type=STRING col=3 offset=104
    null=(offset=1 mask=80)), Slot(id=4 type=STRING col=4 offset=120
    null=(offset=2 mask=1)), Slot(id=5 type=STRING col=5 offset=136
    null=(offset=2 mask=2)), Slot(id=6 type=STRING col=6 offset=152
    null=(offset=2 mask=4)), Slot(id=7 type=INT col=7 offset=4 null=(offset=0
    mask=1)), Slot(id=8 type=INT col=8 offset=8 null=(offset=0 mask=2)),
    Slot(id=9 type=INT col=9 offset=12 null=(offset=0 mask=4)), Slot(id=10
    type=INT col=10 offset=16 null=(offset=0 mask=8)), Slot(id=11 type=INT
    col=11 offset=20 null=(offset=0 mask=10)), Slot(id=12 type=INT col=12
    offset=24 null=(offset=0 mask=20)), Slot(id=13 type=INT col=13 offset=28
    null=(offset=0 mask=40)), Slot(id=14 type=INT col=14 offset=32
    null=(offset=0 mask=80)), Slot(id=15 type=INT col=15 offset=36
    null=(offset=1 mask=1)), Slot(id=16 type=INT col=16 offset=40
    null=(offset=1 mask=2)), Slot(id=17 type=INT col=17 offset=44
    null=(offset=1 mask=4)), Slot(id=18 type=INT col=18 offset=48
    null=(offset=1 mask=8))])
    I0322 09:17:43.556880 13157 coordinator.cc:377] starting 3 backends for
    query 82a22caa76ed4ace:8b95a9e078a99be3
    I0322 09:17:43.557260 13363 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.8:22000
    I0322 09:17:43.557416 13365 impala-server.cc:1327] ExecPlanFragment()
    instance_id=82a22caa76ed4ace:8b95a9e078a99be6 coord=10.219.197.8:22000
    backend#=1
    I0322 09:17:43.557441 13365 plan-fragment-executor.cc:80] Prepare():
    query_id=82a22caa76ed4ace:8b95a9e078a99be3
    instance_id=82a22caa76ed4ace:8b95a9e078a99be6
    I0322 09:17:43.557760 13364 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.9:22000
    I0322 09:17:43.560917 13365 plan-fragment-executor.cc:93] descriptor table
    for fragment=82a22caa76ed4ace:8b95a9e078a99be6
    tuples:
    Tuple(id=0 size=168 slots=[Slot(id=0 type=STRING col=0 offset=56
    null=(offset=1 mask=10)), Slot(id=1 type=STRING col=1 offset=72
    null=(offset=1 mask=20)), Slot(id=2 type=STRING col=2 offset=88
    null=(offset=1 mask=40)), Slot(id=3 type=STRING col=3 offset=104
    null=(offset=1 mask=80)), Slot(id=4 type=STRING col=4 offset=120
    null=(offset=2 mask=1)), Slot(id=5 type=STRING col=5 offset=136
    null=(offset=2 mask=2)), Slot(id=6 type=STRING col=6 offset=152
    null=(offset=2 mask=4)), Slot(id=7 type=INT col=7 offset=4 null=(offset=0
    mask=1)), Slot(id=8 type=INT col=8 offset=8 null=(offset=0 mask=2)),
    Slot(id=9 type=INT col=9 offset=12 null=(offset=0 mask=4)), Slot(id=10
    type=INT col=10 offset=16 null=(offset=0 mask=8)), Slot(id=11 type=INT
    col=11 offset=20 null=(offset=0 mask=10)), Slot(id=12 type=INT col=12
    offset=24 null=(offset=0 mask=20)), Slot(id=13 type=INT col=13 offset=28
    null=(offset=0 mask=40)), Slot(id=14 type=INT col=14 offset=32
    null=(offset=0 mask=80)), Slot(id=15 type=INT col=15 offset=36
    null=(offset=1 mask=1)), Slot(id=16 type=INT col=16 offset=40
    null=(offset=1 mask=2)), Slot(id=17 type=INT col=17 offset=44
    null=(offset=1 mask=4)), Slot(id=18 type=INT col=18 offset=48
    null=(offset=1 mask=8))])
    I0322 09:17:43.721863 13365 client-cache.cc:68] GetClient(): creating
    client for 10.219.197.8:22000
    I0322 09:17:43.722115 13366 plan-fragment-executor.cc:207] Open():
    instance_id=82a22caa76ed4ace:8b95a9e078a99be6
    I0322 09:17:43.734423 13369 coordinator.cc:1003] Backend 2 completed, 2
    remaining: query_id=82a22caa76ed4ace:8b95a9e078a99be3
    I0322 09:17:43.742041 13369 coordinator.cc:1012]
    query_id=82a22caa76ed4ace:8b95a9e078a99be3: first in-progress backend:
    10.219.197.8:22000
    I0322 09:17:43.742085 13297 coordinator.cc:1003] Backend 0 completed, 1
    remaining: query_id=82a22caa76ed4ace:8b95a9e078a99be3
    I0322 09:17:43.742099 13297 coordinator.cc:1012]
    query_id=82a22caa76ed4ace:8b95a9e078a99be3: first in-progress backend:
    10.219.197.8:22000
    I0322 09:17:43.742933 13370 plan-fragment-executor.cc:207] Open():
    instance_id=82a22caa76ed4ace:8b95a9e078a99be4
    I0322 09:17:43.937532 13157 impala-beeswax-server.cc:272]
    get_results_metadata(): query_id=82a22caa76ed4ace:8b95a9e078a99be3

    please provide help to improve impala performence.

    Thanks,
    Anil.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMar 6, '13 at 11:22a
activeMar 22, '13 at 3:54a
posts20
users4
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase