Search Discussions

65 discussions - 225 posts

  • Hi, everybody. I'm running into some difficulties getting needed libraries to map/reduce tasks using the distributed cache. I'm using Hadoop 0.20.2, which from what I can tell is a hard requirement ...
    John ArmstrongJohn Armstrong
    May 26, 2011 at 2:46 pm
    Jun 1, 2011 at 8:06 pm
  • Hi, I have my application installed on Tomcat and I wish to submit M/R jobs programmatically. Is there any standard way to do that ? Thanks, Lior
    Lior SchachterLior Schachter
    May 18, 2011 at 2:59 pm
    May 19, 2011 at 4:12 pm
  • Hi, I just upgraded to hadoop-, and am having problem starting the datanode: # hadoop datanode Unrecognized option: -jvm Could not create the Java virtual machine. It looks like it has ...
    Anh NguyenAnh Nguyen
    May 20, 2011 at 5:35 pm
    May 20, 2011 at 11:00 pm
  • Hi, I have few input splits that are few MB in size. I want to submit 1 GB of input to every mapper. How can I do it ? Currently each mapper gets one input split that results in many small map-output ...
    Mapred LearnMapred Learn
    May 25, 2011 at 12:18 am
    May 25, 2011 at 7:44 pm
  • Hi all, There is lots of SequenceFile in HDFS, how can I merge them into one SequenceFile? Thanks for you suggestion. -Lin
    May 12, 2011 at 12:18 am
    May 25, 2011 at 7:26 pm
  • Hi, I'm running hadoop map-reduce in a cluster with 10 machines. I would like to set in the configuration that each tasktracker can run 8 map tasks simultaneously and 4 reduce tasks simultaneously. ...
    Pedro CostaPedro Costa
    May 23, 2011 at 4:46 pm
    May 24, 2011 at 11:01 am
  • Hi guys, I have a cluster of 16 machines running Hadoop. Now I want to do some benchmark on this cluster with the "nnbench" and "mrbench". I'm new to the hadoop thing and have no one to refer to. I ...
    Stanley ShiStanley Shi
    May 8, 2011 at 12:55 am
    May 10, 2011 at 2:26 pm
  • Hello Hadoop Users, I would like to know if anyone has ever tried splitting an input sequence file by key instead of by size. I know that this is unusual for the map reduce paradigm but I am in a ...
    Vincent XueVincent Xue
    May 23, 2011 at 9:09 am
    May 24, 2011 at 4:06 am
  • Hi All, I'm trying to run hadoop(0.20.2) examples in Pseudo-Distributed Mode following the hadoop user guide. After I run the 'start-all.sh', it seems the namenode can't connect to datanode. 'SSH ...
    May 19, 2011 at 3:23 am
    May 20, 2011 at 4:05 pm
  • Hi, I'm a newbie with Hadoop/MapReduce. I've a problem with hadoop. I set some variables in the run function but when Map running, he can't get the value of theses variables... If anyone knows the ...
    Laurent HatierLaurent Hatier
    May 27, 2011 at 11:52 am
    May 30, 2011 at 8:30 am
  • Does anyone knows how to save and how to retrieve an instance of a class using the Configuration class?
    Michael GiannakopoulosMichael Giannakopoulos
    May 25, 2011 at 6:01 pm
    May 26, 2011 at 4:11 pm
  • Hello guys, I have written an application that downloads metadata from 3 groups of Flickr and i implement a map/reduce task so as metadata to be processed by 3 different mappers (each corresponds to ...
    Michael GiannakopoulosMichael Giannakopoulos
    May 25, 2011 at 3:52 pm
    May 25, 2011 at 4:20 pm
  • In general, the Java interfaces say that one invocation of a combiner (technically, a Class<? extends Reducer ) can output multiple (key,value) pairs. So: What happens if one invocation of a combiner ...
    Mike SpreitzerMike Spreitzer
    May 23, 2011 at 6:33 pm
    May 23, 2011 at 9:22 pm
  • Dear all, We have a task to run a map-reduce job multiple times to do some machine learning calculation. We will first use a mapper to update the data iteratively, and then use the reducer to process ...
    Stanley XuStanley Xu
    May 3, 2011 at 6:09 am
    May 4, 2011 at 10:20 am
  • any comments??? 2011/4/28 baran cakici <barancakici@gmail.com
    Baran cakiciBaran cakici
    May 2, 2011 at 11:38 am
    May 4, 2011 at 9:21 am
  • All, I have three questions I would appreciate if anyone could weigh in on. I apologise in advance if I sound whiny. 1. The namenode logs, when I view them from a browser, are displayed with the ...
    Geoffry RobertsGeoffry Roberts
    May 3, 2011 at 5:22 pm
    May 3, 2011 at 11:11 pm
  • Hi, i was trying to create a test based on mapreduce job in a local mode testing various partitioning issues. But curiously, whenever i switch mapreduce into local node, i can't seem to be able to ...
    Dmitriy LyubimovDmitriy Lyubimov
    May 2, 2011 at 10:53 pm
    May 3, 2011 at 12:28 am
  • Thanks for your replies! I use TableOutputFormat to delete entries from a HBase Table and MO for HFileOutputFormat. Until yesterday I used normal HFileOutputFormat output (not MO) and the files ...
    Panayotis AntonopoulosPanayotis Antonopoulos
    May 31, 2011 at 2:27 am
    May 31, 2011 at 7:42 am
  • All, I am mostly seeking confirmation as to my thinking on this matter. I have an MR job that I believe will force me into using a single reducer. The nature of the process is one where calculations ...
    Geoffry RobertsGeoffry Roberts
    May 12, 2011 at 5:44 pm
    May 13, 2011 at 2:34 pm
  • Hi, all. I want to write lots of little files (32GB) to HDFS as org.apache.hadoop.io.SequenceFile. But now it is too slow: we use about 8 hours to create this SequenceFile (6.7GB). So I wonder how to ...
    May 11, 2011 at 11:49 pm
    May 12, 2011 at 3:56 pm
  • We have intermittently seen cases where a job will "freeze" for some as yet unknown reason, and thereby block other processes waiting for that job to complete. I'm trying to modify our job-launching ...
    Adam PhelpsAdam Phelps
    May 10, 2011 at 10:45 pm
    May 11, 2011 at 5:28 am
  • I have a basic job that is dying, I think, on one badly compressed file. Is there a way to see what file it is choking on? Via the job tracker I can find the mapper that is dying but I cannot find a ...
    Jonathan CoveneyJonathan Coveney
    May 10, 2011 at 3:36 pm
    May 10, 2011 at 5:25 pm
  • All, I need for each one of my reducers to have read access to a certain object or a clone thereof. I can instantiate this object a start up. How can I give my reducers a copy? -- Geoffry Roberts
    Geoffry RobertsGeoffry Roberts
    May 6, 2011 at 5:13 pm
    May 6, 2011 at 5:59 pm
  • Hello, I just noticed that the files that are created using MultipleOutputs remain in the temporary folder into attempt sub-folders when there is no normal output (using context.write(...)). Has ...
    Panayotis AntonopoulosPanayotis Antonopoulos
    May 30, 2011 at 3:33 pm
    May 30, 2011 at 7:51 pm
  • Anyone knows the mechanism that hadoop use to load Map and Reduce class on the remote node where the JobTracker submit the tasks? In particular, how can hadoop retrieves the .class files ? Thanks
    Francesco De LucaFrancesco De Luca
    May 27, 2011 at 2:17 pm
    May 27, 2011 at 3:17 pm
  • Hi, My question is when I run a command from hdfs client, for eg. hadoop fs -copyFromLocal or create a sequence file writer in java code and append key/values to it through Hadoop APIs, does it ...
    Mapred LearnMapred Learn
    May 18, 2011 at 12:44 am
    May 27, 2011 at 5:07 am
  • Is it possible to dynamically specify the slave nodes instead to specify them statically through the configuration files? I want to build a dynamic environment in which each node can enter and exit, ...
    Francesco De LucaFrancesco De Luca
    May 24, 2011 at 10:43 am
    May 24, 2011 at 1:14 pm
  • Hi, I have implemented a custom record reader to read fixed length records. Pseudo code is as: class CRecordReader extends RecordReader<Text, BytesWritable { private FileSplit fileSplit; private ...
    Mapred LearnMapred Learn
    May 20, 2011 at 12:45 am
    May 23, 2011 at 9:33 pm
  • Hi guys, I've just found a problem with the class TableSplit. It implements "equals", but it does not implement hashCode also, as it should have. I've discovered it by trying to use a HashSet of ...
    Lucian IordacheLucian Iordache
    May 20, 2011 at 10:39 am
    May 20, 2011 at 1:13 pm
  • Hello, I have a Hadoop/Hbase cluster, cloudera cdh3u0 version, with several machines. I've created a job that does some work using the information from a HBase table. The next example *works* fine: - ...
    Lucian IordacheLucian Iordache
    May 16, 2011 at 3:31 pm
    May 18, 2011 at 9:00 am
  • All, I am attempting to pass a string value from my driver to each one of my mappers and it is not working. I can set the value, but when I read it back it returns null. the value is not null when I ...
    Geoffry RobertsGeoffry Roberts
    May 11, 2011 at 3:11 pm
    May 11, 2011 at 4:51 pm
  • Hi all, I just remember there's a property for setting the number of failure task can been tolerated in one job. Does anyone know what's the property name ? -- Best Regards Jeff Zhang
    Jeff ZhangJeff Zhang
    May 10, 2011 at 8:33 am
    May 11, 2011 at 8:56 am
  • Hi, I've written my first very simple job that does something with hbase. Now when I try to submit my jar in my cluster I get this: [nbasjes@master ~/src/catalogloader/run]$ hadoop jar ...
    Niels BasjesNiels Basjes
    May 3, 2011 at 1:43 pm
    May 10, 2011 at 4:11 pm
  • Hi, I'm running hadoop (Cloudera release 3) in pseudo distributed mode, with the linux task controller so that jobs will run as the user who submitted them. My program (which uses hadoop cascading) ...
    May 6, 2011 at 6:45 pm
    May 6, 2011 at 7:28 pm
  • Hi, I have extracted the hadoop-0.20.2, hadoop- and hadoop-0.21.0 files. In the hadoop-0.21.0 folder the hadoop-hdfs-0.21.0.jar, hadoop-mapred-0.21.0.jar and the hadoop-common-0.21.0.jar ...
    Praveen SripatiPraveen Sripati
    May 30, 2011 at 1:46 pm
    May 30, 2011 at 1:50 pm
  • Hi, We have MapReduce program which writes data to mysql database using DBOutputFormat. Our program has one reducer. I understand that all the inserts happen during the close() operation of the ...
    Giridhar AddepalliGiridhar Addepalli
    May 25, 2011 at 8:58 pm
    May 25, 2011 at 10:51 pm
  • Hello all, How do I print the job status of each job on the client with the % complete. I am invoking the hadoop jobs using the java client (not hadoop cli) and I am not seeinf the map and reduce job ...
    Praveen PeddiPraveen Peddi
    May 23, 2011 at 10:45 pm
    May 23, 2011 at 11:35 pm
  • I need to save some data in the job config as part of OutputFormat.checkOutputSpecs(), and have it propagated to map tasks. It seems that the property is saved correctly when ...
    Jane ChenJane Chen
    May 20, 2011 at 6:25 pm
    May 21, 2011 at 5:09 pm
  • Hi All, I was investigating the ways to profile the hadoop code. All I found is to use ...
    Shuja RehmanShuja Rehman
    May 19, 2011 at 8:57 am
    May 19, 2011 at 1:13 pm
  • Hi,all..How can I run a MR job though my own program instead of using console to submit a job to a real Hadoop env? I write code like this, this program works fine but i don't think it ran in my ...
    May 17, 2011 at 10:47 am
    May 17, 2011 at 9:39 pm
  • Hi, As of now, primary namenode and secondary namenode are running on the same machine in our configuration. As both are RAM heavy processes, we want to move secondary namenode to another machine in ...
    Giridhar AddepalliGiridhar Addepalli
    May 13, 2011 at 9:14 am
    May 13, 2011 at 11:43 am
  • Hi, 1 - I'm looking for the map and reduce functions of the several examples of the Gridmix2 platform (Webdatasort, Webdatascan, Monsterquery, javasort and combiner) and I can't find it? Where can I ...
    Pedro CostaPedro Costa
    May 2, 2011 at 10:09 am
    May 10, 2011 at 3:11 pm
  • Hi, I have a job that processes raw data inside tarballs. As job input I have a text file listing the full HDFS path of the files that need to be processed, e.g.: ... /user/eric/file451.tar.gz ...
    May 9, 2011 at 9:48 am
    May 9, 2011 at 6:25 pm
  • All, I am attempting to take a large file and split it up into a series of smaller files. I want the smaller files to be named based on values taken from the large file. I am using ...
    Geoffry RobertsGeoffry Roberts
    May 6, 2011 at 5:56 pm
    May 6, 2011 at 6:16 pm
  • Dear All, Our team is trying to implement a parallelized LDA with Gibbs Sampling. We are using the algorithm mentioned by plda, http://code.google.com/p/plda/ The problem is that by the Map-Reduce ...
    Stanley XuStanley Xu
    May 5, 2011 at 9:17 am
    May 5, 2011 at 3:30 pm
  • Hi guys, I asked this question earlier but did not get any response. So, posting again. Hope somebody can point to the right description: When you do hadoop fs -copyFromLocal or use API to call ...
    Mapred LearnMapred Learn
    May 31, 2011 at 11:57 pm
    May 31, 2011 at 11:57 pm
  • I need to track every state change in a Job and in all its tasks. In particular i also need to track Job and Task failures. Does exists any API to do this? Somewhat like Publish-Subscribe paradigm? ...
    Francesco De LucaFrancesco De Luca
    May 24, 2011 at 10:27 am
    May 24, 2011 at 10:27 am
  • Hi Karthik, FYI, I'm moving this thread to mapreduce-user@hadoop.apache.org (You and common-user are BCCed). My guess is that your task trackers are throwing a lot of exceptions which are getting ...
    Joey EcheverriaJoey Echeverria
    May 23, 2011 at 12:46 pm
    May 23, 2011 at 12:46 pm
  • Hi Mark, FYI, I'm moving the discussion over to mapreduce-user@hadoop.apache.org since your question is specific to MapReduce. You can derive the output name from the TaskAttemptID which you can get ...
    Joey EcheverriaJoey Echeverria
    May 23, 2011 at 12:42 pm
    May 23, 2011 at 12:42 pm
  • All, I have a job where there I need to have but a single reducer. i.e. job.setNumReduceTasks(1); When I do this, I get no log file for the reducer; mappers yes, but reducer no. If I remove the ...
    Geoffry RobertsGeoffry Roberts
    May 20, 2011 at 4:18 pm
    May 20, 2011 at 4:18 pm
Group Navigation
period‹ prev | May 2011 | next ›
Group Overview
groupmapreduce-user @

65 users for May 2011

Harsh J: 16 posts Geoffry Roberts: 15 posts Mapred Learn: 14 posts Marcos Ortiz: 11 posts Pedro Costa: 8 posts Joey Echeverria: 7 posts Aaron Baff: 6 posts John Armstrong: 6 posts Michael Giannakopoulos: 6 posts Anh Nguyen: 5 posts Francesco De Luca: 5 posts Jason: 5 posts Panayotis Antonopoulos: 5 posts Robert Evans: 5 posts Stanley Xu: 5 posts 丛林: 5 posts Stanley Shi: 4 posts Baran cakici: 4 posts Laurent Hatier: 4 posts Lior Schachter: 4 posts
show more