|| at Nov 24, 2012 at 7:08 am
I saw warehouse directory on my master node:50070. There i saw Replication
having value *3* and Block size having value *128 MB*.
And i went to masternode:50030/jobtracker.jsp -> job -> map ( where details
of map and reduce are given) ->All task -> selected one task from
given list -> Machine. There's a table that shows up when i navigate till
the end after selecting a particular job in that Machine field was
*had detail as /default/master.mydomain.com.
Do let me know other ways to figure out the same as you mentioned.
My file is around 4.4 GB.
On Fri, Nov 23, 2012 at 4:14 PM, Deepak Tiwari wrote:
moving it to cdh-user..
well. I would rather say that Hadoop divides a file into multiple
blocks,and the block size which is configurable... then its replicated to
other nodes... How big is your file? If you go to namenode then go to dfs
browse and navigate to your file.. there you can see how and where the
blocks are populated.. You mentioned that you see it running only on Master
node..( probably ) How did you check that?
On Fri, Nov 23, 2012 at 4:02 PM, Mehal Patel wrote:
I know that hadoop divides whole file into multiple files and distributes
them on to the slave nodes. But in my case
i see whole (original file) file replicated onto each slave node's
warehouse directory. That is i dont get to see
small chunk of files on datanodes.
Do you know why is this the case ? Is there any config that has to be
done in order to carry out this kind of configuration ?
On Fri, Nov 23, 2012 at 4:00 PM, Mehal Patel wrote:
i see that all job run on only 1 node that is on master. I have also
included master node to be one of the datanode.
Can you please tell me how i can configure hadoop in such a manner that
will include slave nodes also to be a part of execution of
hive sequel query.
Also, when i execute a complex query job is creating around 20 map tasks
and 5 reduce tasks but probably they are getting
created only on master node.
On Fri, Nov 23, 2012 at 3:52 PM, Deepak Tiwari wrote:
Hive queries are nothing but are series of map reduce jobs. You can
see job execution is jobtracker, where in the job conf you can see the
actual query also, when you go to associated tasks and click on their log
output.. you can see which node they are running on. ( there are other ways
to see that too).
On Fri, Nov 23, 2012 at 2:14 PM, Mehal Patel wrote:
I have a hadoop cluster of 1 master node and 7 slave nodes. Now i am
using HIVE to analyze my huge data.
I have already imported data into HIVE tables. I am also able to run
queries and see correct output also.
My question is how can i know if all my slave node's are contributing
in executing hive sequel queries ?
Please let me know if somebody knows.