FAQ
Hi Alex/ Group,



Thanks for your response. Is there something called "Hadoop client"? Google
does not suggest me one!



Should this Hadoop client/ Hadoop be installed, configured as we did with
Hadoop on a server? So, will this Hadoop client occupies memory/ disk space
for running data/ name nodes, slaves.



Thank You,

Shravan Kumar. M

Catalytic Software Ltd. [SEI-CMMI Level 5 Company]

-----------------------------

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator -
netopshelpdesk@catalytic.com

_____

From: Alex Loddengaard
Sent: Thursday, July 09, 2009 11:19 PM
To: shravan.mahankali@catalytic.com
Cc: common-user@hadoop.apache.org
Subject: Re: how to use hadoop in real life?



Writing a Java program that uses the API is basically equivalent to
installed a Hadoop client and writing a Python script to manipulate HDFS and
fire off a MR job. It's up to you to decide how much you like Java :).

Alex

On Thu, Jul 9, 2009 at 2:27 AM, Shravan Mahankali
wrote:

Hi Group,

I have data to be analyzed and I would like to dump this data to Hadoop from
machine.X where as Hadoop is running from machine.Y, after dumping this data
to data I would like to initiate a job, get this data analyzed and get the
output information back to machine.X

I would like to do all this programmatically. Am going through Hadoop API
for this same purpose. I remember last day Alex was saying to install Hadoop
in machine.X, but I was not sure why to do that?

I simple write a Java program including Hadoop-core jar, I was planning to
use "FsUrlStreamHandlerFactory" to connect to Hadoop in machine.Y and then
use "org.apache.hadoop.fs.shell" to copy data to Hadoop machine and initiate
the job and get the results.

Please advice.

Thank You,

Shravan Kumar. M
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-----------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator - netopshelpdesk@catalytic.com

-----Original Message-----

From: Shravan Mahankali
Sent: Thursday, July 09, 2009 10:35 AM
To: common-user@hadoop.apache.org

Cc: 'Alex Loddengaard'
Subject: RE: how to use hadoop in real life?

Thanks for the information Ted.

Regards,
Shravan Kumar. M
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-----------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator - netopshelpdesk@catalytic.com

-----Original Message-----
From: Ted Dunning
Sent: Wednesday, July 08, 2009 10:48 PM
To: common-user@hadoop.apache.org; shravan.mahankali@catalytic.com
Cc: Alex Loddengaard
Subject: Re: how to use hadoop in real life?

In general hadoop is simpler than you might imagine.

Yes, you need to create directories to store data. This is much lighter
weight than creating a table in SQL.

But the key question is volume. Hadoop makes some things easier and Pig
queries are generally easier to write than SQL (for programmers ... not for
those raised on SQL), but, overall, map-reduce programs really are more work
to write than SQL queries until you get to really large scale problems.

If your database has less than 10 million rows or so, I would recommend that
you consider doing all analysis in SQL augmented by procedural languages.
Only as your data goes beyond 100 million to a billion rows do the clear
advantages of map-reduce formulation become apparent.
On Tue, Jul 7, 2009 at 11:35 PM, Shravan Mahankali wrote:

Use Case: We have a web app where user performs some actions, we have to
track these actions and various parameters related to action initiator, we
actually store this information in the database. But our manager has
suggested evaluating Hadoop for this scenario, however, am not clear that
every time I run a job in Hadoop I have to create a directory and how can I
track that later to read the data analyzed by Hadoop. Even though I drop
user action information in Hadoop, I have to put this information in our
database such that it knows the trend and responds for various of requests
accordingy.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 11 of 13 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 6, '09 at 12:26p
activeJul 10, '09 at 5:46a
posts13
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase