First thanks again for responding, I saw that katta within their search
engine already allow to do full text search within pdf box to search and
index pdf files ;) I will study your video training tonigth to learn how to
implement the job for xml within your video :))
2009/6/15 Alex Loddengaard <alex@cloudera.com>
Well, you define what your job does, but I expect that nearly all MR jobs
do
their parsing in the mapper, not in the reducer. You may find these two
videos useful:
<http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
<http://www.cloudera.com/hadoop-training-programming-with-hadoop>
Hope this helps!
Alex
On Sat, Jun 13, 2009 at 1:42 AM, Alexandre Jaquet <alexjaquet@gmail.com
this
alexjaquet@gmail.com
jobs
do
their parsing in the mapper, not in the reducer. You may find these two
videos useful:
<http://www.cloudera.com/hadoop-training-mapreduce-hdfs>
<http://www.cloudera.com/hadoop-training-programming-with-hadoop>
Hope this helps!
Alex
On Sat, Jun 13, 2009 at 1:42 AM, Alexandre Jaquet <alexjaquet@gmail.com
wrote:
Thanks Alex,
Parsing the documents is a task done within the reducer ? we collect the
datas (document input) within a mapper and then parse it ?
Thanks in advance
Alexandre Jaquet
2009/6/13 Alex Loddengaard <alex@cloudera.com>
toThanks Alex,
Parsing the documents is a task done within the reducer ? we collect the
datas (document input) within a mapper and then parse it ?
Thanks in advance
Alexandre Jaquet
2009/6/13 Alex Loddengaard <alex@cloudera.com>
When you refer to "filesystem," do you mean HDFS?
It's very common to store lots of text files in HDFS and run multiple jobs
to process / learn about those text files. As for XML support, you can use
Java libraries (or Python libraries if you're using Hadoop streaming)
It's very common to store lots of text files in HDFS and run multiple jobs
to process / learn about those text files. As for XML support, you can use
Java libraries (or Python libraries if you're using Hadoop streaming)
parse the XML; Hadoop itself doesn't have much XML support. I hope
answers your question.
Alex
On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <
Alex
On Fri, Jun 12, 2009 at 1:31 PM, Alexandre Jaquet <
wrote:
Hi,
Does hadoop and map / reduce will allow me to parse large quantity of open
xml files distributed inside the same filesystem but using multipe
Hi,
Does hadoop and map / reduce will allow me to parse large quantity of open
xml files distributed inside the same filesystem but using multipe
?
Thx
Alexandre Jaquet
Alexandre Jaquet