Search Discussions

58 discussions - 201 posts

  • Hi, I have a use case where I want to pass a threshold value to a map-reduce job. For eg: error records=10. I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, ...
    Mapred LearnMapred Learn
    Nov 14, 2011 at 11:06 pm
    Nov 18, 2011 at 4:32 am
  • Until now we were manually copying our Jars to all machines in a Hadoop cluster. This used to work until our cluster size was small. Now our cluster is getting bigger. What's the best way to start a ...
    Something SomethingSomething Something
    Nov 16, 2011 at 6:40 am
    Nov 20, 2011 at 2:26 am
  • Hi, I have a cluster of 4 tasktracker/datanodes and 1 JobTracker/Namenode. I can run small jobs on this cluster fine (like up to a few thousand keys) but more than that and I start seeing errors like ...
    Russell BrownRussell Brown
    Nov 4, 2011 at 3:29 pm
    Nov 8, 2011 at 2:27 pm
  • Hi, all I am using the Amazon EC2, with their large instances. Amazon claims these large type instances have 4 EC2 Compute units (2 virtual cores with 2 EC2 Compute Units each). But according to my ...
    Jiamin LuJiamin Lu
    Nov 24, 2011 at 3:41 pm
    Nov 26, 2011 at 3:31 pm
  • Is the idea of writing business logic in cleanup method of a Mapper good or bad? We think we can make our Mapper run faster if we keep accumulating data in a HashMap in a Mapper, and later in the ...
    Something SomethingSomething Something
    Nov 17, 2011 at 4:23 am
    Nov 18, 2011 at 7:13 pm
  • I have created a counter in mapper to count something, I wanna get the counter's value in reducer phase, the code segment is as follow: public class MM extends Mapper<LongWritable, Text, Text, Text { ...
    Nov 30, 2011 at 9:37 am
    Dec 12, 2011 at 5:30 pm
  • Hoot ThompsonHoot Thompson
    Nov 12, 2011 at 11:32 pm
    Nov 15, 2011 at 3:54 pm
  • Hello, I configured mapred-site.xml with one mapper, still the web ui shows: Map shows capacity 40 = 2 x #machines. I was expecting it to be 20. Any idea? Thanks, Keren Running Map TasksRunning ...
    Keren OuaknineKeren Ouaknine
    Nov 25, 2011 at 3:16 am
    Nov 30, 2011 at 5:04 am
  • I am just trying to get off the ground with MRv2. The first node (in pseudo distributed mode) is working fine - ran a couple of TeraSort's on it. The second node has a serious issue with its single ...
    Stephen BoeschStephen Boesch
    Nov 29, 2011 at 2:44 pm
    Nov 29, 2011 at 5:54 pm
  • My job is dying during a map task write. This happened in enough task to kill the job although most tasks succeeded - Any ideas as to where to start diagnosing the problem Caused by: ...
    Steve LewisSteve Lewis
    Nov 5, 2011 at 5:10 pm
    Nov 7, 2011 at 8:30 pm
  • * Jobid**Priority**User**Name**Map % Complete**Map Total* *Maps Completed**Reduce % Complete**Reduce Total**Reduces Completed**Job Scheduling Information**Diagnostic In fo * ...
    Keren OuaknineKeren Ouaknine
    Nov 29, 2011 at 3:09 am
    Dec 1, 2011 at 4:09 pm
  • Hi I set up a pseudo cluster according to the instructions here http://www.cloudera.com/blog/2011/11/building-and-deploying-mr2/. Initially the randomwriter example worked. But after a crash on the ...
    Stephen BoeschStephen Boesch
    Nov 28, 2011 at 6:40 pm
    Nov 29, 2011 at 12:39 am
  • I have 2 input seq files 32MB each. I want to run them on as many mappers as possible. i appended -D mapred.max.split.size=1000000 as command line argument to job, but there is no difference. Job ...
    Radim KolarRadim Kolar
    Nov 9, 2011 at 11:12 am
    Nov 9, 2011 at 6:46 pm
  • Hi all, Let me preface this with my understanding of how tasks work. If a task takes a long time (default 10min) and demonstrates no progress, the task tracker will decide the process is hung, kill ...
    Christopher EgnerChristopher Egner
    Nov 4, 2011 at 6:20 am
    Nov 8, 2011 at 4:32 pm
  • Hi Hadoop users, I have been reading about Hadoop metrics framework, and I was wondering is it possible to create custom metrics for specific job. In my use case I want to capture some specific ...
    Dino KečoDino Kečo
    Nov 2, 2011 at 7:45 am
    Nov 2, 2011 at 3:36 pm
  • Hi all, I was wandering if there is a off-the-shelf solution to re-use the output of the job which was killed when re-running the job? Here's my use-case: Job (with map phase only) is running and has ...
    Samir EljazovicSamir Eljazovic
    Nov 24, 2011 at 11:54 pm
    Dec 1, 2011 at 12:25 am
  • I am running a job on a cluster launching from a windows box and " fs.default.name" to point the job to the cluster. Everything works until the last step where I say FileSystem fileSystem = ...
    Steve LewisSteve Lewis
    Nov 21, 2011 at 11:47 pm
    Nov 23, 2011 at 4:36 am
  • Hi, I have a largish job running that, due to the quirks of the third party input format I'm using, has 280,000 map tasks. ( I know this is far from ideal but it's it'll do for me ) I'm passing this ...
    Mat KelceyMat Kelcey
    Nov 20, 2011 at 9:32 pm
    Nov 21, 2011 at 1:21 am
  • Hi All, We have been using dual socket quad core machines for a while and have been running with 8 mappers 2 reducers. The rule of thumb we heard was slightly oversubscribe the number of cores but at ...
    Tom HallTom Hall
    Nov 14, 2011 at 2:57 am
    Nov 16, 2011 at 5:30 pm
  • I have a job which takes an xml file - the splitter breaks the file into tags, the mapper parses each tag and sends the data to the reducer. I am using a custom splitter which reads the file looking ...
    Steve LewisSteve Lewis
    Nov 4, 2011 at 3:08 am
    Nov 7, 2011 at 7:59 am
  • I'm a total newbie @ Hadoop and and trying to follow an example (a Useful Partitioner Class) on the Hadoop Streaming Wiki, but with my data. So I have data like this: 520460379 1 14067 759015 1142 3 ...
    Dan YoungDan Young
    Nov 3, 2011 at 4:52 am
    Nov 3, 2011 at 3:06 pm
  • I have problem with reduce tasks ending like this: Task attempt_201111250441_0009_r_000001_1 failed to report status for 602 seconds. Killing! Every reduce tasks running on particular block ends like ...
    Radim KolarRadim Kolar
    Nov 28, 2011 at 8:32 pm
    Dec 2, 2011 at 5:02 pm
  • Hi, Any work out there on using hbase, hive, pig with MRv2? thx! stephenb
    Stephen BoeschStephen Boesch
    Nov 27, 2011 at 1:12 am
    Nov 27, 2011 at 10:13 pm
  • I was wondering if a context.write is possible in the cleanup of a Mapper. For example, if the map function queues the item in an external process, and then in the cleanup phase all the results from ...
    Kim EbertKim Ebert
    Nov 18, 2011 at 7:14 pm
    Nov 18, 2011 at 8:53 pm
  • Hi, I think the below issue was only due to HDFS architecture and not map-reduce. BUt just to make sure that's the case, I am cross-posting to this group as well. I have also attached the program ...
    Sudharsan SampathSudharsan Sampath
    Nov 4, 2011 at 9:42 am
    Nov 9, 2011 at 5:20 pm
  • Hi, What is the difference between setting the mapred.job.map.memory.mb and mapred.child.java.opts using -Xmx to control the maximum memory used by a Mapper and Reduce task? Which one takes ...
    Praveen SripatiPraveen Sripati
    Nov 6, 2011 at 3:17 pm
    Nov 8, 2011 at 4:20 pm
  • Hello, I am having the following problem with Distributed Caching. *In the driver class, I am doing the following: (/home/arko/MyProgram/data is a directory created as an output of another ...
    Arko Provo MukherjeeArko Provo Mukherjee
    Nov 8, 2011 at 7:54 am
    Nov 8, 2011 at 11:47 am
  • Hello list, I have a map-only job and I'd like to compress the output (possibly avoiding a re-compression when the map-output gets promoted as final output). I can see 4 ways of obtaining it: 1) by ...
    Claudio MartellaClaudio Martella
    Nov 7, 2011 at 2:49 pm
    Nov 7, 2011 at 3:00 pm
  • I have been finding a that my cluster is running abnormally slowly A typical reduce task reports reduce copy (113 of 431 at 0.07 MB/s) 70 kb / second is a truely dreadful rate and tasks are running ...
    Steve LewisSteve Lewis
    Nov 4, 2011 at 3:21 pm
    Nov 4, 2011 at 7:26 pm
  • Hi I have a computation to do for a large input - a single large sequence file. Ideally I would like to set a specific number of mappers and designate each to process over a specific range of ...
    Rob PodolskiRob Podolski
    Nov 30, 2011 at 6:29 am
    Nov 30, 2011 at 5:21 pm
  • Hi, I've built some SequenceFiles using a custom WritableComparable. I also decided to reorganize package structure and ended up renaming the whole thing. Since the key and value classes are embedded ...
    Markus JelsmaMarkus Jelsma
    Nov 30, 2011 at 3:44 pm
    Nov 30, 2011 at 4:18 pm
  • hi,all when I start a Job,lots of messages are printed on screen,as follows: Job started: Thu Nov 17 22:15:57 CST 2011 11/11/17 22:15:58 WARN snappy.LoadSnappy: Snappy native library is available ...
    Seven garfeeSeven garfee
    Nov 17, 2011 at 2:37 pm
    Nov 17, 2011 at 3:07 pm
  • Who: Dr. Jimmy Lin, University of Maryland, When: 2 December 2011, 11.00 - 12.00 Where: Amolf, 2nd floor, Science Park 104, Amsterdam Directions: http://www.amolf.nl/about-amolf/visitor/#c42 Register ...
    Evert LammertsEvert Lammerts
    Nov 17, 2011 at 2:22 pm
    Nov 17, 2011 at 2:30 pm
  • Hi Bangalore Area Hadoop Developers and Users, There is a lot of interest in Hadoop and Big Data space in Bangalore. Many folks have been asking for Bangalore meetups for long. I have just created ...
    Sharad AgarwalSharad Agarwal
    Nov 17, 2011 at 1:01 pm
    Nov 17, 2011 at 2:13 pm
  • Hi Experts I'm currently working out to incorporate a performance test plan for a series of hadoop jobs.My entire application consists of map reduce, hive and flume jobs chained one after another and ...
    Bejoy KsBejoy Ks
    Nov 15, 2011 at 7:31 am
    Nov 15, 2011 at 8:50 am
  • Moving it to mapreduce-user. Ronnie, Is jobclient.getalljobs() something that you are looking for? thanks mahadev
    Mahadev KonarMahadev Konar
    Nov 12, 2011 at 9:20 pm
    Nov 12, 2011 at 11:04 pm
  • Hi ! I want to understand how tasks are assigned to a taskracker/node in mumak then change it . For that i see that in SimulatorJobTracker.java @ Line 432 assignTasks() method is important L432: ...
    Arun kArun k
    Nov 12, 2011 at 7:07 am
    Nov 12, 2011 at 11:46 am
  • Hi, When i am trying to build hadoop-mapreduce-project in hadoop trunk i came across this.Please help me out with this.Thanks in advance [ERROR] COMPILATION ERROR : [INFO] ...
    Rajesh puttaRajesh putta
    Nov 11, 2011 at 9:04 pm
    Nov 12, 2011 at 2:24 am
  • Hadoop can set the maximum mappers and reducers running on a node but under 0.20.2 I do not see a way to limit the system from running mappers and reducers together with the total exceeding ...
    Steve LewisSteve Lewis
    Nov 10, 2011 at 3:06 am
    Nov 10, 2011 at 4:23 am
  • Hello list, I have a task where I have compare the entries of a big sequencefile with the entries of many small sequencefiles. Basically you could describe it like this: for entry in bigSequenceFile: ...
    Claudio MartellaClaudio Martella
    Nov 5, 2011 at 5:48 pm
    Nov 7, 2011 at 4:27 pm
  • Hi, Suppose I have chained M/R jobs that traverse a graph and look for nodes with a specific value. Every time a Map encounters that value, I'd like to keep that node in the final result. I can of ...
    Yaron GonenYaron Gonen
    Nov 6, 2011 at 4:22 pm
    Nov 7, 2011 at 4:19 pm
  • Hi guys ! I see that hadoop doesn't capture the Map task I/O time and Reduce task I/O time and captures only map runtime and reduce runtime. Am i right ? By I/O time for map task i meant time taken ...
    Arun kArun k
    Nov 29, 2011 at 2:25 pm
    Nov 29, 2011 at 2:25 pm
  • Hi guys ! I was trying to use Rumen to generate trace files. I have few queries : Q1 Is there any way to create a new trace file from job history logs with custom set of split locations ? Q2 Can we ...
    Arun kArun k
    Nov 25, 2011 at 2:42 pm
    Nov 25, 2011 at 2:42 pm
  • Hi, I try to install hadoop 0.23 and form a small cluster with 3 machines. Whenever i try to start nodemanager and resource manager.The nodemanager fails to start by throwing the following error ...
    Sri ramSri ram
    Nov 25, 2011 at 8:59 am
    Nov 25, 2011 at 8:59 am
  • Hi ! I see that InputFormat, FileSplit and FileInputFormat files are involved in creating the splits of the input data for jobs. I am interested in knowing how this splitting and choosing hosts where ...
    Arun kArun k
    Nov 18, 2011 at 5:37 pm
    Nov 18, 2011 at 5:37 pm
  • unsubscibe
    Francesco De LucaFrancesco De Luca
    Nov 16, 2011 at 5:52 pm
    Nov 16, 2011 at 5:52 pm
  • Hi guys ! Q How can i assign data of each job in mumak nodes and what else i need to do ? In general how can i use the pluggable block-placement for HDFS in Mumak ? Meaning in my context i am using ...
    Arun kArun k
    Nov 15, 2011 at 2:22 pm
    Nov 15, 2011 at 2:22 pm
  • Hello Fellow Hadoopists, We are meeting at *7:15 PM Nov 17th* at the University Heights Community Center 5031 University Way NE Seattle WA 98105 Room #*209 (upstairs)* *Seattle Hadoop Distributing ...
    Sean Jensen-GreySean Jensen-Grey
    Nov 11, 2011 at 8:28 pm
    Nov 11, 2011 at 8:28 pm
  • Hi guys ! I have gone thru Mumak code. I ran mumak.sh with given Job and Topology trace files .In my understanding i see that when a job is fetched from JobStoryProducer an event is asscoiated with ...
    Arun kArun k
    Nov 10, 2011 at 10:19 am
    Nov 10, 2011 at 10:19 am
  • I have an OutputFormat which implements Configurable. I set new config entries to a job configuration during checkOutputSpec() so that the tasks will get the config entries through the job ...
    Jane ChenJane Chen
    Nov 8, 2011 at 7:34 pm
    Nov 8, 2011 at 7:34 pm
Group Navigation
period‹ prev | Nov 2011 | next ›
Group Overview
groupmapreduce-user @

66 users for November 2011

Harsh J: 21 posts Robert Evans: 12 posts Stephen Boesch: 12 posts Steve Lewis: 10 posts Something Something: 9 posts Bejoy Ks: 7 posts Russell Brown: 7 posts Uma Maheswara Rao G 72686: 7 posts Arun k: 6 posts Hoot Thompson: 6 posts Mapred Learn: 5 posts Sudharsan Sampath: 5 posts Arun Murthy: 4 posts Keren Ouaknine: 4 posts Mohamed Riadh Trad: 4 posts Praveen Sripati: 4 posts Milind Bhandarkar: 3 posts Claudio Martella: 3 posts Dan Young: 3 posts David Rosenstrauch: 3 posts
show more