FAQ
My Task is

1) Initially I want to import the data from MS SQL Server into HDFS using
SQOOP.

2) Through Hive I am processing the data and generating the result in one
table

3) That result containing table from Hive is again exported to MS SQL
SERVER back.

I want to perform all this using Amazon Elastic Map Reduce.


The data which I am importing from MS SQL Server is very large (near about
5,00,000 entries in one table. Like wise I have 30 tables). For this I have
written a task in Hive which contains only queries (And each query has used
a lot of joins in it). So due to this the performance is very poor on my
single local machine ( It takes near about 3 hrs to execute completely).

I want to reduce that time as much less as possible. For that we have
decided to use Amazon Elastic Mapreduce. Currently I am using 3 m1.large
instance and still I have same performance as on my local machine.

In order to improve the performance what number of instances should I need
to use? As number of instances we use are they configured automatically or
do I need to specify while submitting JAR to it for execution? Because as I
use two machine time is same.

And also Is there any other way to improve the performance or just to
increase the number of instance. Or am I doing something wrong while
executing JAR?

Please guide me through this as I don't much about the Amazon Servers.

Thanks.


--
Regards,
Bhavesh Shah

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 4, '12 at 6:52a
activeMay 4, '12 at 6:52a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Bhavesh Shah: 1 post

People

Translate

site design / logo © 2022 Grokbase