Search Discussions

47 discussions - 136 posts

  • Hello, I've been getting the following error when trying to run a very simple MapReduce job. Map finishes without problem, but error occurs as soon as it enters Reduce phase. 10/06/24 18:41:00 INFO ...
    Jun 24, 2010 at 7:30 pm
    Jul 9, 2010 at 1:18 pm
  • Hi, 1 - Hadoop MR uses a job.jar used to execute MR. But who creates and when is created this job.jar? 2 - This job.jar is only created at run time during the execution of an MR task? 3 - It's ...
    Pedro CostaPedro Costa
    Jun 24, 2010 at 9:38 pm
    Jun 27, 2010 at 5:15 am
  • Hi all, We have a MapReduce job writing a Lucene index (modeled closely after the example in contrib), and we keep hitting out of memory exceptions in the reduce phase once the number of files grows ...
    Ruben QuinteroRuben Quintero
    Jun 11, 2010 at 9:20 pm
    Jun 15, 2010 at 5:04 pm
  • Hello everyone I was wondering if anyone has used Hadoop to train a Conditional Random Field (CRF) model (specifically Stanford Named Entity Recognizer). I am not getting the idea of what should be ...
    Jun 28, 2010 at 10:19 pm
    Jul 12, 2010 at 12:35 pm
  • Problem with Reducer emitting a different Key than Mapper IO have the FOllowing code where the Mapper emits a custom Key and the reducer isa expected to emit text Using Hadoop 0.2 on a local instance ...
    Steve LewisSteve Lewis
    Jun 16, 2010 at 4:16 pm
    Jun 18, 2010 at 10:00 pm
  • I know I can get current InputSplit inside a mapper with InputSplit split = context.getInputSplit(); but is there a way to get a list of all InputSplits? cheers -- Torsten
    Torsten CurdtTorsten Curdt
    Jun 5, 2010 at 6:15 pm
    Jun 6, 2010 at 11:54 pm
  • when I say hadoop fs -copyFromLocal small_yeast /user/training/small_yeast I get org.apache.hadoop.ipc.RemoteException: java.io.IOException: File ...
    Steve LewisSteve Lewis
    Jun 22, 2010 at 7:56 pm
    Jun 23, 2010 at 12:20 am
  • Hi, I am a newbie to Hadoop. I want to use the Multi threaded runner by default, so I tried to change the MapTask.java code. it failed to compile using ant, as mapreduce - mapred library conflict was ...
    Jyothish SomanJyothish Soman
    Jun 11, 2010 at 5:31 am
    Jun 17, 2010 at 1:11 pm
  • When I set job.setPartitionerClass(MyPartitioner.class); job.setNumReduceTasks(4); I would expect to see my MyParitioner get called with getPartition(key, value, 4) but still I see it only get called ...
    Torsten CurdtTorsten Curdt
    Jun 6, 2010 at 5:34 pm
    Jun 7, 2010 at 12:11 am
  • Hi, I'm studyong the partition implementation of MR and I have some questions. As I understand from what I've read, the partition has the purpose to tell to each reducer which map output it will ...
    Pedro CostaPedro Costa
    Jun 30, 2010 at 8:30 pm
    Jul 1, 2010 at 2:38 am
  • Our Hadoop streaming jobs are unpredictable. The following illustrates this problem. I reset the system first to start with a clean slate (deleting tmp files on master and all slaves, reformatting ...
    Ratner, Alan S (IS)Ratner, Alan S (IS)
    Jun 11, 2010 at 1:36 pm
    Jun 22, 2010 at 2:27 am
  • This class is a copy of a standard WordCount class with one critical exception Instead of the Mapper Emitting a Key of Type Text it emits a key of type MyText - s simple subclass of Text The reducer ...
    Steve LewisSteve Lewis
    Jun 18, 2010 at 6:10 pm
    Jun 19, 2010 at 5:56 pm
  • I am trying to develop a streaming MR application by implementing korn-shell based mapper and reducer. I want to use 'space - x20' as the separator between key and value throughout the application. ...
    Chinni, RaviChinni, Ravi
    Jun 16, 2010 at 3:50 pm
    Jun 17, 2010 at 8:08 pm
  • I am setting some custom values on my job configuration: Configuration conf = new Configuration(); conf.set("job.time.from", time_from); conf.set("job.time.until", time_until); Cluster cluster = new ...
    Torsten CurdtTorsten Curdt
    Jun 11, 2010 at 12:05 pm
    Jun 11, 2010 at 3:36 pm
  • I need to emit to different output files from a reducer. The old API had MultipleSequenceFileOutputFormat. Am I missing something or is this gone in the new API? Are there any problems porting this ...
    Torsten CurdtTorsten Curdt
    Jun 7, 2010 at 2:38 pm
    Jun 9, 2010 at 4:44 am
  • Hi, I am trying to write output to MYSQL DB, I am getting following error java.io.IOException at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutp utFormat.java:180) at ...
    Giridhar AddepalliGiridhar Addepalli
    Jun 8, 2010 at 9:32 am
    Jun 8, 2010 at 12:38 pm
  • Hi, My input look like (userid, itemid) as follows: ... 122641863,5060057723326 123441107,9789020282948 ... I tried to write a MapReduce Job with Mapper<Object, Text, IntWritable, IntWritable that ...
    Laszlo DosaLaszlo Dosa
    Jun 30, 2010 at 8:42 am
    Jul 2, 2010 at 9:44 am
  • Hi, 1 - Hadoop uses several ports to run. It exists ports for HDFS, for the MapReduce JvmTasks, etc. I don't know how I can identify all the ports that a MapReduce and HDFS uses. I'm running the ...
    Pedro CostaPedro Costa
    Jun 23, 2010 at 10:13 pm
    Jun 24, 2010 at 2:00 am
  • Hi all, When I run long running map/reduce jobs the reducers run past 100% before reaching completion. Sometimes as far as up to 140%. I have searched the mailing list and other resources and noticed ...
    Friso van VollenhovenFriso van Vollenhoven
    Jun 21, 2010 at 3:45 pm
    Jun 22, 2010 at 10:21 am
  • I was just wondering the other day: What if the the values for a key that get passed into the reducer do not fit into memory? After all a reducer should get all values per key from the whole job. Is ...
    Torsten CurdtTorsten Curdt
    Jun 22, 2010 at 12:15 am
    Jun 22, 2010 at 7:05 am
  • As some folks have found out the hard way, only the first member of a concatenated gzip file is recognized by current versions of Hadoop, including trunk; the remainder is silently ignored. I'm ...
    Greg RoelofsGreg Roelofs
    Jun 15, 2010 at 7:56 pm
    Jun 17, 2010 at 8:53 pm
  • Hello, I am running several different mapreduce jobs. For some of them it is better to have a rather high number of running map tasks per node, whereas others do very intensive read operations on our ...
    Alex MunteanuAlex Munteanu
    Jun 3, 2010 at 8:45 am
    Jun 5, 2010 at 5:57 am
  • MessageHi, I'm using Cloudera's 0.20.2+228 release. How do I create a custom Counter using the NEW API? In my Mapper class I tried this: public class MyMapper extends Mapper<Object, Text, Text, Text ...
    Some BodySome Body
    Jun 29, 2010 at 3:16 pm
    Jun 29, 2010 at 3:56 pm
  • I have a number of files which can be read and converted into a series of lines of lext - however the means of reading the file is not known to the standard Hadoop splitters. I understand that I can ...
    Steve LewisSteve Lewis
    Jun 24, 2010 at 7:45 pm
    Jun 25, 2010 at 4:44 am
  • Assume I have a large file called *BigData.unsorted* ( say 500GB) consisting of lines of text. Assume that these lines are in random order - I understand how to assign a key to lines and that Hadoop ...
    Steve LewisSteve Lewis
    Jun 23, 2010 at 5:15 pm
    Jun 24, 2010 at 5:40 pm
  • Assume I have one of the two situations (I have both) 1) I have a directory with several hundred files - of these some fraction need to be passed to the mapper (say the ones ending in ".foo") and the ...
    Steve LewisSteve Lewis
    Jun 23, 2010 at 5:26 pm
    Jun 23, 2010 at 6:00 pm
  • I am running into an issue that the splitter is reading the wrong file and causing my program to fail - I cannot find which file is being read - context.getConfiguration().get("map.input.file"); ...
    Steve LewisSteve Lewis
    Jun 11, 2010 at 4:14 pm
    Jun 13, 2010 at 3:33 am
  • Hi, I am using hadoop 0.20.2 Maperduce framework by default writes output to part-r-0000 etc. I want to write to a file with different name. I am trying to override "getDefaultWorkFile" method in ...
    Giridhar AddepalliGiridhar Addepalli
    Jun 10, 2010 at 12:49 pm
    Jun 10, 2010 at 4:27 pm
  • Hi, If I define in mapred-site.xml the property mapred.reduce.tasks to 1, how many reduce tasks will actually run? I think it will run 2 and I don't know why. But in a log that I've added, the two ...
    Jun 9, 2010 at 3:18 pm
    Jun 10, 2010 at 8:00 am
  • Hi, I'm facing difficulty in understanding all the concepts in Hadoop MR. 1 - Input files in MR contains index files. What's the purpose of the index files in hadoop? 2 - MR uses split files. A split ...
    Jun 9, 2010 at 4:37 pm
    Jun 9, 2010 at 9:17 pm
  • Hi, What's the difference between a TaskTracker and a TaskInProgress? Regards, -- Pedro
    Jun 9, 2010 at 4:30 pm
    Jun 9, 2010 at 5:40 pm
  • Hi, This message is a little long. I beg your patient. Our team would like to tune MR performance by changing parameters in Hadoop project and JVM according to the MR Job status and result. First, ...
    WANG ShicaiWANG Shicai
    Jun 2, 2010 at 7:54 am
    Jun 2, 2010 at 8:38 am
  • Moving the thread to mapreduce-user mailing list. Are you passing a new api mapper? If you are passing a new api mapper, and wants to use MultipleInputs, it is not supported in branch 0.20. ...
    Amareshwari Sri RamadasuAmareshwari Sri Ramadasu
    Jun 28, 2010 at 7:26 am
    Jun 28, 2010 at 7:26 am
  • Dear all, I am using Hadoop 0.20.2 with the DistributedCache API. Currently, I figured out that either I use the following ways to add a cached archive from the HDFS to local slaves, the second ...
    Stephen TAK-LON WUStephen TAK-LON WU
    Jun 23, 2010 at 12:47 pm
    Jun 23, 2010 at 12:47 pm
  • I get the following reports for a small virtual 4 node cluster - I think I might be low on space - especially since I want to upload about 1 gb but am too much of a newbie to know how to raise the ...
    Steve LewisSteve Lewis
    Jun 22, 2010 at 11:33 pm
    Jun 22, 2010 at 11:33 pm
  • Hello again, I ran into another issue with pseudo distributed mode, I allocated 10GB for in memory filesystem. I ran sort (not terasort) with 10 GB of data, and there was absolutely no effect of the ...
    Jyothish SomanJyothish Soman
    Jun 20, 2010 at 8:18 am
    Jun 20, 2010 at 8:18 am
  • This isn't a HBase question, this is for mapreduce-user@hadoop.apache.org J-D
    Jean-Daniel CryansJean-Daniel Cryans
    Jun 15, 2010 at 4:01 pm
    Jun 15, 2010 at 4:01 pm
  • We're excited to announce Surge, the Scalability and Performance Conference, to be held in Baltimore on Sept 30 and Oct 1, 2010. The event focuses on case studies that demonstrate successes (and ...
    Jason DixonJason Dixon
    Jun 15, 2010 at 5:24 am
    Jun 15, 2010 at 5:24 am
  • I have a problem where I am using Java and the hadoop APIS to run a map reduce job on data that can be considered as a set of lines of text. At the reduce stage I have a collection of lines of text ...
    Steve LewisSteve Lewis
    Jun 11, 2010 at 4:26 pm
    Jun 11, 2010 at 4:26 pm
  • Hi, I see "Shuffle and Sort Configuration Tuning" in "Hadoop---The Definitive Guide", which told me that each job in the same cluster can use different parameters below without restart the cluster. ...
    WANG ShicaiWANG Shicai
    Jun 11, 2010 at 2:48 am
    Jun 11, 2010 at 2:48 am
  • Hadoop 0.21 using the new API. All working. Then I try to use MultipleOutputs in my reducer: private MultipleOutputs<Text, Text mos; protected void setup(Context context) throws IOException, ...
    Torsten CurdtTorsten Curdt
    Jun 10, 2010 at 1:23 pm
    Jun 10, 2010 at 1:23 pm
  • Hello Fellow Hadoopists, We are meeting at 7:15 pm on June 17th at the University Heights Community Center 5031 University Way NE Seattle WA 98105 Room #110 We are looking for people to present. So ...
    Sean Jensen-GreySean Jensen-Grey
    Jun 10, 2010 at 1:41 am
    Jun 10, 2010 at 1:41 am
  • Hi, I see a tuning tool---vaidya---from Amogh, but there is not enough information. Is there any tool in Hadoop for MR jobs' profiling? Thank you! Best Wishes, Evan 赶快注册雅虎超大容量免费邮箱? ...
    WANG ShicaiWANG Shicai
    Jun 7, 2010 at 3:12 am
    Jun 7, 2010 at 3:12 am
  • Hey folks, I have the following keys/lines as input 2010-03-01 11:56/A - 1 2010-03-01 11:57/A - 1 2010-03-01 11:57/A - 1 2010-03-01 11:57/B - 1 2010-03-01 11:58/B - 1 2010-03-01 11:58/A - 1 ...
    Torsten CurdtTorsten Curdt
    Jun 4, 2010 at 3:28 pm
    Jun 4, 2010 at 3:28 pm
  • I am thinking of the following problem lately. I started thinking of this problem in the following context. I have a predefined budget and I can either -- A) purchase 8 more powerful servers (4cpu x ...
    Sean BigdatafunSean Bigdatafun
    Jun 3, 2010 at 5:14 am
    Jun 3, 2010 at 5:14 am
  • First International Workshop on Theory and Practice of Mapreduce (MapRed 2010) Indianapolis, USA: Nov 30 - Dec 3, 2010 MapReduce is a programming model, introduced by Google in 2004, to simplify ...
    Milind A BhandarkarMilind A Bhandarkar
    Jun 1, 2010 at 4:18 am
    Jun 1, 2010 at 4:18 am
Group Navigation
period‹ prev | Jun 2010 | next ›
Group Overview
groupmapreduce-user @

48 users for June 2010

Steve Lewis: 15 posts Torsten Curdt: 12 posts Ted Yu: 9 posts Psdc1978: 8 posts Aaron Kimball: 6 posts Amareshwari Sri Ramadasu: 5 posts Hemanth Yamijala: 5 posts Owen O'Malley: 5 posts Allen Wittenauer: 4 posts Eric Sammer: 4 posts James Hammerton: 4 posts Jyothish Soman: 4 posts Mohamed Riadh Trad: 4 posts Ruben Quintero: 4 posts Giridhar Addepalli: 3 posts Greg Roelofs: 3 posts Sonal Goyal: 3 posts WANG Shicai: 3 posts Bmdevelopment: 2 posts Chinni, Ravi: 2 posts
show more