Search Discussions
-
Hello guys, I would like to know how to do file uploads in HDFS using java,is it to be done using map reduce what if i have a large number of small files should i use sequence file along with map ...
Visioner sadak
Oct 4, 2011 at 6:54 pm
Oct 13, 2011 at 6:12 pm -
Hi all, I am currently encountering a tough problem, my job use MultipleOutputFormat to output result into different folder, and I have to use a combiner to enhance performance. In this situation, ...
Xin Jing
Oct 28, 2011 at 4:36 am
Oct 28, 2011 at 1:44 pm -
Hi, Does streaming jar create 1 reducer by default ? We have reduce tasks per task tracker configured to be more than 1 but my job has about 150 mappers and only 1 reducer: reducer.py basically just ...
Mapred Learn
Oct 21, 2011 at 10:00 pm
Oct 22, 2011 at 3:13 pm -
Hi everybody, we have the following scenario: our clustered web application needs to write records to hbase, we need to support a very high throughput, we expect up to 10-30 thousends requests per ...
Andreas Reiter
Oct 28, 2011 at 2:08 pm
Nov 16, 2011 at 7:40 pm -
Hi, Hadoop tasks are always stacked to form a linear user-managed workflow (a reduce step cannot start before all previous mappers have stopped etc). This may be problematic in recursive tasks: for ...
Yaron Gonen
Oct 4, 2011 at 3:46 pm
Oct 5, 2011 at 1:51 pm -
Hi, found that several people have run into this issue, but I was not able to find a solution yet. We have reduce tasks that leave a hanging "child" process. The implementation uses a lot of third ...
Henning Blohm
Oct 27, 2011 at 9:59 am
Nov 17, 2011 at 4:18 pm -
Hello, I have a situation where I am reading a big file from HDFS and then comparing all the data in that file with each input to the mapper. Now since my mapper is trying to read the entire HDFS ...
Arko Provo Mukherjee
Oct 31, 2011 at 11:45 pm
Nov 1, 2011 at 3:52 am -
Hi, I have a situation where I have to read a large file into every mapper. Since its a large HDFS file that is needed to work on each input to the mapper, it is taking a lot of time to read the data ...
Arko Provo Mukherjee
Oct 27, 2011 at 8:22 am
Oct 31, 2011 at 11:41 pm -
Hi, i am having problem implementing unsort for crawler in map/reduce. I have list of URLs waiting to fetch, they needs to be reordered for maximum distance between URLs from one domain. idea is to ...
Radim Kolar
Oct 25, 2011 at 10:46 am
Oct 27, 2011 at 9:38 am -
Hi. I'm currently running a Hadoop cluster on Amazon's EMR service, which appears to be the 0.20.2 codebase plus several patches from the (deprecated?) 0.20.3 branch. I'm interested in switching from ...
Kai Ju Liu
Oct 26, 2011 at 9:55 pm
Oct 26, 2011 at 10:43 pm -
We have a small cluster with HDFS running on only 8 nodes - I believe that the partition assigned to hdfs might be getting full and wonder if the web tools or java api havew a way to look at free ...
Steve Lewis
Oct 15, 2011 at 12:16 am
Oct 17, 2011 at 3:55 pm -
Can anyone confirm whether the skip options work for MR jobs using the new API? I have a job using the new API and I cannot get the job to skip corrupted records. I tried configuring job properties ...
Justin Woody
Oct 12, 2011 at 4:37 pm
Oct 14, 2011 at 12:00 pm -
Hi, The map method in the Mapper gets as a parameter a single line from the split. Is there a way for Mappers to get the whole split as input? I'd like to scan the whole split before I decide which ...
Yaron Gonen
Oct 12, 2011 at 9:13 am
Oct 12, 2011 at 7:14 pm -
My map task needs to handle a large gzipped file and site at 0% forever until it hits 100% there is no way to split the file but it would be nice if there were some indication of progress - any way ...
Steve Lewis
Oct 26, 2011 at 12:56 am
Nov 3, 2011 at 6:22 pm -
Hi Experts I'm really interested in understanding the end to end flow,functionality,components and protocols in MRv2.Currently I don't know any thing on MRv2,so I require some document that would ...
Bejoy KS
Oct 13, 2011 at 2:24 pm
Oct 14, 2011 at 3:18 pm -
We are planning to enable secure Hadoop using Kerberos. Our users reside in the active directory. We read that there are two options to use Kerberos for securing Hadoop. 1) You run Kerberos on ...
Bigbibguy father
Oct 1, 2011 at 2:20 am
Oct 3, 2011 at 8:45 pm -
Hello everybody, I am working on Terrier (www.terrier.org) an IR toolkit that leverages hadoop for indexing large amount of data (ie documents). I am working both local with a small subset of the ...
Marco Didonna
Oct 27, 2011 at 4:43 pm
Oct 29, 2011 at 7:39 am -
Hi, Could somebody point me to chained map- red example ? I m trying to run another map only job after a map red job. Thanks, JJ Sent from my iPhone
Mapred Learn
Oct 28, 2011 at 2:34 pm
Oct 29, 2011 at 4:49 am -
Hey, I set up a hadoop cluster on EC2 using this documentation: http://wiki.apache.org/hadoop/AmazonEC2 OS: Linux Fedora 8 Hadoop version is 0.20.203.0 java version "1.7.0_01" heap size: 1Gb (stats ...
Artem Yankov
Oct 25, 2011 at 5:56 pm
Oct 27, 2011 at 8:42 am -
I am relatively new here and starting the CDH3u1 (on vmware). The nameserver is not coming up due to the following error: 2011-10-25 22:47:00,547 INFO org.apache.hadoop.hdfs.server.common.Storage: ...
Stephen Boesch
Oct 26, 2011 at 6:19 am
Oct 26, 2011 at 10:48 am -
I have an MR task which runs well with a single input file or an input directory with dozens of 50MB input files. When the data is in a single input file of 1 GB of more the mapper never gets to 0%. ...
Steve Lewis
Oct 14, 2011 at 3:24 pm
Oct 14, 2011 at 4:03 pm -
Hi, all, I am trying to write an application which needs the mapper to split its output to file and reducer. For example, if a mapper produces two key-value pairs (a, 1) and (b, 2), how can I write ...
Ke Zhai
Oct 12, 2011 at 7:01 pm
Oct 12, 2011 at 7:10 pm -
Hi, Is there a way to stop an entire job when a certain condition is met in the map/reduce function? Like looking for a particular key or value. Thanks, Praveen
Praveen Sripati
Oct 1, 2011 at 3:39 am
Oct 1, 2011 at 5:06 am -
While I can see file sizes with the web interface, it is very difficult to tell which directories are taking up space especially when nested by several levels -- Steven M. Lewis PhD 4221 105th Ave NE ...
Steve Lewis
Oct 26, 2011 at 12:52 am
Oct 26, 2011 at 1:00 am -
Hello All, I have a few questions concerning the TaskTracker's JVM re-use that I couldn't unearth some details about: Is the configured amount of tasks for reuse a suggestion or will it actually use ...
Adam Shook
Oct 25, 2011 at 5:19 pm
Oct 25, 2011 at 7:05 pm -
Hello Mark, Moving to mapreduce-dev@ (bcc'd common-user@). You need to control whatever calls Reporter#setProgress(…). Mostly its just the RecordReader implementation doing it via ...
Harsh J
Oct 20, 2011 at 4:31 am
Oct 20, 2011 at 4:34 am -
hi,guys I'm debuging pipes program on mapreduce, and trying debug script to print some debug info. I used the default pipes script under src/c++/pips/debug, and put it on hdfs, create a symlink in ...
Seven garfee
Oct 14, 2011 at 7:27 am
Oct 15, 2011 at 3:02 pm -
Hello Everyone, I have a particular situation, where I am trying to run Iterative Map-Reduce, where the output files for one iteration are the input files for the next. It stops when there are no new ...
Arko Provo Mukherjee
Oct 12, 2011 at 6:55 am
Oct 12, 2011 at 8:00 am -
Hi, We have class hierarchy for output value for both mapper as well as reducer class as parent (abstract class) , child1,child2,… We have mapper class which is specified with its outputvalue class ...
Anuja Kulkarni
Oct 4, 2011 at 4:47 pm
Oct 5, 2011 at 9:39 am -
Hi all, I am combing through the tasktracker logs and I found many exceptions traces in the log. I am wondering if anyone with experience can help me eliminate it. Thanks, Felix 1) Exception in ...
Felix gao
Oct 29, 2011 at 7:28 pm
Oct 29, 2011 at 7:28 pm -
I think my hdfs may be sick but when I run jobs on out 8 node cluster I have started seeing 11/10/26 15:42:30 WARN mapred.JobClient: Error reading task outputhttp:// ...
Steve Lewis
Oct 26, 2011 at 10:53 pm
Oct 26, 2011 at 10:53 pm -
Hi, I am trying to create output files of fixed size by using : -Dmapred.max.split.size=6442450812 (6 Gb) But the problem is that the input Data size and metadata varies and I have to adjust above ...
Mapred Learn
Oct 26, 2011 at 12:27 am
Oct 26, 2011 at 12:27 am -
I read online (I think which refers to the mapred.tasktracker.map.tasks.maximum and reduce proprerties)* *: " If you have 1 CPU with 4 cores then setting map to 3 and reduce to 3 would be good ...
Antonio Paolacci
Oct 23, 2011 at 10:42 am
Oct 23, 2011 at 10:42 am -
Stuti, No need to copy the job jar file to every machine in the cluster. You can copy the jar file to the job tracker and execute it using the 'jar' command. If you want to submit the job from the ...
Devaraj K
Oct 18, 2011 at 11:24 am
Oct 18, 2011 at 11:24 am -
Moving to MR-user. You will need to make a composite value class that contains the real value and a nonce that indicates which behavior is intended for the given emit. Cheers, Anthony
Anthony Urso
Oct 16, 2011 at 1:12 am
Oct 16, 2011 at 1:12 am -
In iterated map-reduce, a series of code-identical jobs where the reduce output of one is the map input of the next, there are two synchronization barriers per iteration: one in the middle of each ...
Mike Spreitzer
Oct 13, 2011 at 5:12 pm
Oct 13, 2011 at 5:12 pm -
Hi, What is the difference between specifying the jar file using JobConf API and the 'hadoop jar' command? JobConf conf = new JobConf(getConf(), getClass()); bin/hadoop jar ...
Praveen Sripati
Oct 12, 2011 at 4:58 am
Oct 12, 2011 at 4:58 am -
Hi all, I'm trying to use Hadoop MapReduce (new api) in a particular way. What I would like to do is to make it work with a external executable not made for mapreduce (but able to read from hdfs), ...
Stefano Alberto Russo
Oct 11, 2011 at 1:08 pm
Oct 11, 2011 at 1:08 pm -
I have code to talk to a remote cluster where host = "myhost:" and port = 9000 String connectString = "hdfs://" + host + ":" + port + "/"; try { Configuration config = new Configuration(); ...
Steve Lewis
Oct 6, 2011 at 9:20 pm
Oct 6, 2011 at 9:20 pm -
I try adding following properties to conf/log4j.properties in hadoop 0.20.2 for testing to get log info in my job log4j.appender.INFO=org.apache.log4j.ConsoleAppender ...
Thomas Anderson
Oct 1, 2011 at 9:38 am
Oct 1, 2011 at 9:38 am
Group Overview
group | mapreduce-user |
categories | hadoop |
discussions | 40 |
posts | 130 |
users | 52 |
website | hadoop.apache.org... |
irc | #hadoop |
52 users for October 2011
Archives
- February 2013 (251)
- January 2013 (868)
- December 2012 (621)
- November 2012 (742)
- October 2012 (868)
- September 2012 (733)
- August 2012 (1,082)
- July 2012 (226)
- June 2012 (135)
- May 2012 (102)
- April 2012 (180)
- March 2012 (164)
- February 2012 (167)
- January 2012 (284)
- December 2011 (249)
- November 2011 (201)
- October 2011 (130)
- September 2011 (310)
- August 2011 (168)
- July 2011 (207)
- June 2011 (241)
- May 2011 (225)
- April 2011 (157)
- March 2011 (146)
- February 2011 (174)
- January 2011 (226)
- December 2010 (166)
- November 2010 (135)
- October 2010 (126)
- September 2010 (145)
- August 2010 (128)
- July 2010 (121)
- June 2010 (136)
- May 2010 (82)
- April 2010 (108)
- March 2010 (62)
- February 2010 (59)
- January 2010 (95)
- December 2009 (46)
- November 2009 (45)
- October 2009 (75)
- September 2009 (24)
- August 2009 (30)
- July 2009 (15)