|
Shahab mehmandoust |
at Oct 31, 2008 at 10:04 pm
|
⇧ |
| |
Currently, I'm just researching so I'm just playing with the idea of
streaming log data into the HDFS.
I'm confused about: "...all you need is a Hadoop install. Your production
node doesn't need to be a
datanode." If my production node is *not* a dataNode then how can I do
"hadoop dfs put?"
I was under the impression that when I install HDFS on a cluster each node
in the cluster is a dataNode.
Shahab
On Fri, Oct 31, 2008 at 1:46 PM, Norbert Burger wrote:What are you using to "stream logs into the HDFS"?
If the command-line tools (ie., "hadoop dfs put") work for you, then all
you
need is a Hadoop install. Your production node doesn't need to be a
datanode.
On Fri, Oct 31, 2008 at 2:35 PM, shahab mehmandoust <shahab53@gmail.com
wrote:
I want to stream data from logs into the HDFS in production but I do NOT
want my production machine to be apart of the computation cluster. The
reason I want to do it in this way is to take advantage of HDFS without
putting computation load on my production machine. Is this possible*?*
Furthermore, is this unnecessary because the computation would not put a
significant load on my production box (obviously depends on the
map/reduce
implementation but I'm asking in general)*?*
I should note that our prod machine hosts our core web application and
database (saving up for another box :-).
Thanks,
Shahab