FAQ
Hello All,

I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am using
HIVE to analyze my huge data.

I have already imported data into HIVE tables. I am also able to run
queries and see correct output also.

My question is how can i know if all my slave node's are contributing in
executing hive sequel queries ?

Please let me know if somebody knows.

Thanks,
Mehal

Search Discussions

  • Deepak Tiwari at Nov 23, 2012 at 11:52 pm
    Hive queries are nothing but are series of map reduce jobs. You can see
    job execution is jobtracker, where in the job conf you can see the actual
    query also, when you go to associated tasks and click on their log output..
    you can see which node they are running on. ( there are other ways to see
    that too).
    On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:

    Hello All,

    I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am using
    HIVE to analyze my huge data.

    I have already imported data into HIVE tables. I am also able to run
    queries and see correct output also.

    My question is how can i know if all my slave node's are contributing in
    executing hive sequel queries ?

    Please let me know if somebody knows.

    Thanks,
    Mehal
  • Mehal Patel at Nov 24, 2012 at 12:08 am
    Hi Deepak,

    i see that all job run on only 1 node that is on master. I have also
    included master node to be one of the datanode.
    Can you please tell me how i can configure hadoop in such a manner that
    will include slave nodes also to be a part of execution of
    hive sequel query.

    Also, when i execute a complex query job is creating around 20 map tasks
    and 5 reduce tasks but probably they are getting
    created only on master node.

    Thanks,
    Mehal
    On Fri, Nov 23, 2012 at 3:52 PM, Deepak Tiwari wrote:

    Hive queries are nothing but are series of map reduce jobs. You can see
    job execution is jobtracker, where in the job conf you can see the actual
    query also, when you go to associated tasks and click on their log output..
    you can see which node they are running on. ( there are other ways to see
    that too).

    On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:

    Hello All,

    I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am
    using HIVE to analyze my huge data.

    I have already imported data into HIVE tables. I am also able to run
    queries and see correct output also.

    My question is how can i know if all my slave node's are contributing in
    executing hive sequel queries ?

    Please let me know if somebody knows.

    Thanks,
    Mehal
  • Mehal Patel at Nov 24, 2012 at 12:02 am
    Also,

    I know that hadoop divides whole file into multiple files and distributes
    them on to the slave nodes. But in my case
    i see whole (original file) file replicated onto each slave node's
    warehouse directory. That is i dont get to see
    small chunk of files on datanodes.

    Do you know why is this the case ? Is there any config that has to be done
    in order to carry out this kind of configuration ?

    Mehal
    On Fri, Nov 23, 2012 at 4:00 PM, Mehal Patel wrote:

    Hi Deepak,

    i see that all job run on only 1 node that is on master. I have also
    included master node to be one of the datanode.
    Can you please tell me how i can configure hadoop in such a manner that
    will include slave nodes also to be a part of execution of
    hive sequel query.

    Also, when i execute a complex query job is creating around 20 map tasks
    and 5 reduce tasks but probably they are getting
    created only on master node.

    Thanks,
    Mehal

    On Fri, Nov 23, 2012 at 3:52 PM, Deepak Tiwari wrote:

    Hive queries are nothing but are series of map reduce jobs. You can see
    job execution is jobtracker, where in the job conf you can see the actual
    query also, when you go to associated tasks and click on their log output..
    you can see which node they are running on. ( there are other ways to see
    that too).

    On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:

    Hello All,

    I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am
    using HIVE to analyze my huge data.

    I have already imported data into HIVE tables. I am also able to run
    queries and see correct output also.

    My question is how can i know if all my slave node's are contributing in
    executing hive sequel queries ?

    Please let me know if somebody knows.

    Thanks,
    Mehal
  • Mehal Patel at Nov 24, 2012 at 7:08 am
    Hi,

    I saw warehouse directory on my master node:50070. There i saw Replication
    having value *3* and Block size having value *128 MB*.
    And i went to masternode:50030/jobtracker.jsp -> job -> map ( where details
    of map and reduce are given) ->All task -> selected one task from
    given list -> Machine. There's a table that shows up when i navigate till
    the end after selecting a particular job in that Machine field was
    there. *Machine
    *had detail as /default/master.mydomain.com.

    Do let me know other ways to figure out the same as you mentioned.

    My file is around 4.4 GB.

    Mehal

    On Fri, Nov 23, 2012 at 4:14 PM, Deepak Tiwari wrote:

    moving it to cdh-user..

    well. I would rather say that Hadoop divides a file into multiple
    blocks,and the block size which is configurable... then its replicated to
    other nodes... How big is your file? If you go to namenode then go to dfs
    browse and navigate to your file.. there you can see how and where the
    blocks are populated.. You mentioned that you see it running only on Master
    node..( probably ) How did you check that?

    On Fri, Nov 23, 2012 at 4:02 PM, Mehal Patel wrote:

    Also,

    I know that hadoop divides whole file into multiple files and distributes
    them on to the slave nodes. But in my case
    i see whole (original file) file replicated onto each slave node's
    warehouse directory. That is i dont get to see
    small chunk of files on datanodes.

    Do you know why is this the case ? Is there any config that has to be
    done in order to carry out this kind of configuration ?

    Mehal

    On Fri, Nov 23, 2012 at 4:00 PM, Mehal Patel wrote:

    Hi Deepak,

    i see that all job run on only 1 node that is on master. I have also
    included master node to be one of the datanode.
    Can you please tell me how i can configure hadoop in such a manner that
    will include slave nodes also to be a part of execution of
    hive sequel query.

    Also, when i execute a complex query job is creating around 20 map tasks
    and 5 reduce tasks but probably they are getting
    created only on master node.

    Thanks,
    Mehal

    On Fri, Nov 23, 2012 at 3:52 PM, Deepak Tiwari wrote:

    Hive queries are nothing but are series of map reduce jobs. You can
    see job execution is jobtracker, where in the job conf you can see the
    actual query also, when you go to associated tasks and click on their log
    output.. you can see which node they are running on. ( there are other ways
    to see that too).

    On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:

    Hello All,

    I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am
    using HIVE to analyze my huge data.

    I have already imported data into HIVE tables. I am also able to run
    queries and see correct output also.

    My question is how can i know if all my slave node's are contributing
    in executing hive sequel queries ?

    Please let me know if somebody knows.

    Thanks,
    Mehal
  • Mehal Patel at Nov 24, 2012 at 7:50 am
    One more question,

    Is it also possible to individual blocks that had been created from the
    original huge data file ?? If yes how to see them ?

    Mehal
    On Fri, Nov 23, 2012 at 11:08 PM, Mehal Patel wrote:


    Hi,

    I saw warehouse directory on my master node:50070. There i saw Replication
    having value *3* and Block size having value *128 MB*.
    And i went to masternode:50030/jobtracker.jsp -> job -> map ( where
    details of map and reduce are given) ->All task -> selected one task from
    given list -> Machine. There's a table that shows up when i navigate till
    the end after selecting a particular job in that Machine field was there.
    *Machine *had detail as /default/master.mydomain.com.

    Do let me know other ways to figure out the same as you mentioned.

    My file is around 4.4 GB.

    Mehal

    On Fri, Nov 23, 2012 at 4:14 PM, Deepak Tiwari wrote:

    moving it to cdh-user..

    well. I would rather say that Hadoop divides a file into multiple
    blocks,and the block size which is configurable... then its replicated to
    other nodes... How big is your file? If you go to namenode then go to dfs
    browse and navigate to your file.. there you can see how and where the
    blocks are populated.. You mentioned that you see it running only on Master
    node..( probably ) How did you check that?

    On Fri, Nov 23, 2012 at 4:02 PM, Mehal Patel wrote:

    Also,

    I know that hadoop divides whole file into multiple files and
    distributes them on to the slave nodes. But in my case
    i see whole (original file) file replicated onto each slave node's
    warehouse directory. That is i dont get to see
    small chunk of files on datanodes.

    Do you know why is this the case ? Is there any config that has to be
    done in order to carry out this kind of configuration ?

    Mehal

    On Fri, Nov 23, 2012 at 4:00 PM, Mehal Patel wrote:

    Hi Deepak,

    i see that all job run on only 1 node that is on master. I have also
    included master node to be one of the datanode.
    Can you please tell me how i can configure hadoop in such a manner that
    will include slave nodes also to be a part of execution of
    hive sequel query.

    Also, when i execute a complex query job is creating around 20 map
    tasks and 5 reduce tasks but probably they are getting
    created only on master node.

    Thanks,
    Mehal

    On Fri, Nov 23, 2012 at 3:52 PM, Deepak Tiwari wrote:

    Hive queries are nothing but are series of map reduce jobs. You can
    see job execution is jobtracker, where in the job conf you can see the
    actual query also, when you go to associated tasks and click on their log
    output.. you can see which node they are running on. ( there are other ways
    to see that too).

    On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:

    Hello All,

    I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am
    using HIVE to analyze my huge data.

    I have already imported data into HIVE tables. I am also able to run
    queries and see correct output also.

    My question is how can i know if all my slave node's are contributing
    in executing hive sequel queries ?

    Please let me know if somebody knows.

    Thanks,
    Mehal

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedNov 23, '12 at 10:14p
activeNov 24, '12 at 7:50a
posts6
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Mehal Patel: 5 posts Deepak Tiwari: 1 post

People

Translate

site design / logo © 2023 Grokbase