Search Discussions

60 discussions - 226 posts

  • Hi, I have some questions related to basic functionality in Hadoop. 1. When a Mapper process the intermediate output data, how it knows how many partitions to do(how many reducers will be) and how ...
    Grandl RobertGrandl Robert
    Jul 8, 2012 at 1:39 am
    Jul 16, 2012 at 8:32 pm
  • I have a job that is emitting over 3 billion rows from the map to the reduce. The job is configured with 43 reduce tasks. A perfectly even distribution would amount to about 70 million rows per ...
    Dave ShineDave Shine
    Jul 20, 2012 at 1:20 pm
    Jul 25, 2012 at 4:11 pm
  • Hi, I am seeking a way to leverage hadoop's distributed cache in order to ship jars that are required to bootstrap a task's jvm, i.e., before a map/reduce task is launched. As a concrete example, ...
    Stan RosenbergStan Rosenberg
    Jul 30, 2012 at 10:24 pm
    Jan 18, 2013 at 1:29 am
  • Hello, I have an MR job that talks to HBase. I use Gora to talk to HBase. Gora also provides couple of classes which can be extended to write Mappers and Reducers, if the mappers need input from an ...
    Sriram RamachandrasekaranSriram Ramachandrasekaran
    Jul 27, 2012 at 7:55 am
    Jul 30, 2012 at 3:54 pm
  • I am using MapReduce streaming with Python code. It works fine, for basic for stdin and stdout. But I have a mapper-only application that also emits some other output files. So in addition to stdout, ...
    Connell, ChuckConnell, Chuck
    Jul 11, 2012 at 8:48 pm
    Jul 12, 2012 at 7:03 pm
  • Hi, I am trying to solve a problem where I need to computed frequencies of words occurring in a file1 from file 2. For example: text in file1: hadoop user hello world and text in file2 is: hadoop ...
    Shanu SushmitaShanu Sushmita
    Jul 20, 2012 at 4:13 pm
    Jul 23, 2012 at 6:24 pm
  • I have a 2-node Fedora system and in cluster mode, I have the following issue that I can't resolve. Hadoop 1.0.3 I'm running with filesystem, file:/// and invoking the simple 'grep' example hadoop ...
    Steve SonnenbergSteve Sonnenberg
    Jul 20, 2012 at 4:37 pm
    Jul 23, 2012 at 6:19 pm
  • Hi , I have a job which has let us say 10 mappers running in parallel. Some are running fast but few of them are taking too long to run. For example few mappers are taking 5 to 10 mins but others are ...
    Kasi SubrahmanyamKasi Subrahmanyam
    Jul 4, 2012 at 12:03 pm
    Jul 10, 2012 at 8:39 am
  • Hi all, I recently upgraded from CDH4b2 (0.23.1) to CDH4 (2.0.0). Now for some strange reason, my MRv2 jobs (TeraGen, specifically) fail if I run with more than one slave. For every slave except the ...
    Jul 17, 2012 at 9:25 pm
    Jul 18, 2012 at 1:26 am
  • Hello list, What could be the approximate maximum size of the files that can be handled using WholeFileInputFormat format??I mean, if the file is very big, then is it feasible to use ...
    Mohammad TariqMohammad Tariq
    Jul 10, 2012 at 1:02 pm
    Jul 11, 2012 at 10:33 pm
  • Hi, The no of mappers is depends on the no of blocks. Is it possible to limit the no of mappers size without increasing the HDFS block size? Thanks in advance. Cheers! Manoj.
    Manoj BabuManoj Babu
    Jul 11, 2012 at 12:30 pm
    Jul 11, 2012 at 3:02 pm
  • Hi all: How can I distribute one map data to all reduce tasks? This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete ...
    Jul 5, 2012 at 3:31 am
    Jul 5, 2012 at 8:10 am
  • Hi, all I have a hadoop cluster with 3 nodes, the network topology is like this: 1. For each DataNode, its IP address is like :192.168.0.XXX; 2. For the NameNode, it has two network cards: one is ...
    Jason YangJason Yang
    Jul 4, 2012 at 9:25 am
    Jul 5, 2012 at 4:58 am
  • Is there some support for server side control of DAG jobs? I mean something which in Tool creates Jobs and submits them with list of dependencies and then exits without waiting for end of jobs. I ...
    Radim KolarRadim Kolar
    Jul 30, 2012 at 7:30 pm
    Aug 1, 2012 at 6:14 pm
  • hi, One of my programs create a huge python dictionary and reducers fails with Memory Error everytime. Is there a way to specify reducer memory to be a bigger value for reducers to succeed ? I know ...
    Mapred LearnMapred Learn
    Jul 29, 2012 at 8:17 am
    Jul 29, 2012 at 6:46 pm
  • i have a MR job to read file on amazon S3 and process the data on local hdfs. the files are zipped text file as .gz. i tried to setup the job as below but it won't work, anyone know what might be ...
    Dan YiDan Yi
    Jul 20, 2012 at 12:45 am
    Oct 2, 2012 at 1:08 pm
  • Hi, Does anybody know if there are some cases where the output/input ratio for map tasks is larger than 1? I can just think of for the sort, it's 1 and for the search job it's usually smaller than ...
    Jul 30, 2012 at 6:47 pm
    Jul 30, 2012 at 9:57 pm
  • Hi all, I have a setting of -XX:+HeapDumpOnOutOfMemoryError on all nodes and I don't have permissions to add location where those dumps will be saved, so I get a message in my mapred process ...
    Marek MiglinskiMarek Miglinski
    Jul 18, 2012 at 5:50 pm
    Jul 19, 2012 at 4:45 pm
  • Hi, I am trying to compile hadoop from command line doing something like: ant compile jar run However, it always delete the conf files content (hadoop-env.sh, core-site.xml, mapred-site.xml, ...
    Grandl RobertGrandl Robert
    Jul 17, 2012 at 8:17 pm
    Jul 18, 2012 at 2:42 am
  • I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 compute nodes). My input size is a sequence file of around 280mb. Generally, my jobs run just fine and all finish in 2-5 ...
    Robert DyerRobert Dyer
    Jul 13, 2012 at 4:03 am
    Jul 17, 2012 at 8:28 pm
  • Hi, I need to upload large xml files files daily. Right now am having a small program to read all the files from local folder and writing it to HDFS as a single file. Is this a right way? If there ...
    Manoj BabuManoj Babu
    Jul 13, 2012 at 3:30 am
    Jul 13, 2012 at 6:15 am
  • Hi, It would be great if you could provide answer for the below doubts. 1,How to change name node storage directory?[i tried *hadoop.tmp.dir,**hadoop.name.dir but it leads to other issue but ...
    Manoj BabuManoj Babu
    Jul 10, 2012 at 5:55 am
    Jul 10, 2012 at 7:33 am
  • Could someone give me a list of the basic Java classes that are needed to run a program in Hadoop? By basic classes I mean classes like Mapper and Reducer that are essential in running programs in ...
    Andrew BotelhoAndrew Botelho
    Jul 30, 2012 at 7:26 pm
    Jul 30, 2012 at 11:42 pm
  • Hi, I am trying to modify the code for data transfer of intermediate output. In this respect, on the reduce side in getMapOuput I want to have the connection with the ...
    Grandl RobertGrandl Robert
    Jul 30, 2012 at 4:11 pm
    Jul 30, 2012 at 5:56 pm
  • Hi, I am a beginner in using Hadoop, and I would like to know what are the right Java classes to use with Hadoop? In other words, which Java classes should be used as a strong foundation to ...
    Andrew BotelhoAndrew Botelho
    Jul 30, 2012 at 3:03 pm
    Jul 30, 2012 at 3:28 pm
  • Hello list, I am trying to run a small MapReduce job that includes KeyValueTextInputFormat with the new API(hadoop-, but it seems KeyValueTextInputFormat is not included in the new API. Am ...
    Mohammad TariqMohammad Tariq
    Jul 25, 2012 at 2:38 pm
    Jul 25, 2012 at 3:03 pm
  • Sometimes there are big discrepency between this time and the real running time at the task level. It can be significantly less than the real running time; for some case, I observe that it is longer ...
    Jul 17, 2012 at 1:14 pm
    Jul 17, 2012 at 6:20 pm
  • Gentles, I want to use the CombineFileInputFormat of Hadoop 0.20.0 / 0.20.2 such that it processes 1 file per record and also doesn't compromise on data - locality (which it normally takes care of) ...
    Manoj BabuManoj Babu
    Jul 12, 2012 at 3:04 pm
    Jul 12, 2012 at 4:10 pm
  • Hi, I am using the programmatic call to initialize the hadoop job. ( "jobClient.submitJob( m_JobConf )") I need to put a big object in distributed cache. So I serialize it and send it over. With the ...
    Jul 11, 2012 at 3:46 am
    Jul 11, 2012 at 1:23 pm
  • Hey guys, I need to update a field in an HBASE table and I want to do a mapred job for that. I can do it using both map and red phase. However, it does not make any sense to me, since map will pass ...
    Pablo MusaPablo Musa
    Jul 10, 2012 at 8:23 pm
    Jul 10, 2012 at 11:06 pm
  • Hello list, Is it possible to emit Java collections from a mapper?? My code looks like this - public class UKOOAMapper extends Mapper<LongWritable, Text, LongWritable, List<Text { public static Text ...
    Mohammad TariqMohammad Tariq
    Jul 10, 2012 at 11:16 am
    Jul 10, 2012 at 11:40 am
  • I am running a (terasort) job on a small cluster but with powerful nodes. The number of reducer slots was 12. I am seeing the following message: Job JOBID="job_201207031814_0011" ...
    Stephen BoeschStephen Boesch
    Jul 4, 2012 at 3:47 pm
    Jul 7, 2012 at 7:25 pm
  • Hi, I'm trying to move from CDH3U3 to CDH4. My existing MR program works fine on CDH3U3 but I cant get it to run on CDH4. Basically my Driver class 1. queries a PG DB and writes some HashMaps to ...
    Alan MillerAlan Miller
    Jul 4, 2012 at 3:51 pm
    Jul 4, 2012 at 7:33 pm
  • While emitting a record from my mapper, I am receiving a NullPointerException. The stack trace seems to indicate there is a problem serializing the Key. The key contains a few strings and a few ...
    Berry, MattBerry, Matt
    Jul 3, 2012 at 9:16 pm
    Jul 4, 2012 at 7:32 pm
  • Hi everyone I don't understand the steps to follow to implement MRv1 with CDH4 install service? run service? config service? mapred-site.xml? must - have the script retrieve and recreate CDH3 ...
    Antoine BoudotAntoine Boudot
    Jul 30, 2012 at 1:20 pm
    Jul 30, 2012 at 3:54 pm
  • I have a slightly modified Text Output Format that essentially writes each key into its own file. It operates off the premise that my reducer is an identity function and it emits each record ...
    Berry, MattBerry, Matt
    Jul 20, 2012 at 12:29 am
    Jul 20, 2012 at 6:07 pm
  • A simplified version of my use-case is to sort a large number of records, and then write all the ones that start with A to a file named A, B to B, etc. Due to the fact that each file can only be ...
    Berry, MattBerry, Matt
    Jul 19, 2012 at 4:23 pm
    Jul 19, 2012 at 4:40 pm
  • Hello, As far as I understand Bulk Import functionality will not take into account the Data Locality question. MR job will create number of reducer tasks same as regions to write into, but it will ...
    Alex BaranauAlex Baranau
    Jul 18, 2012 at 3:46 pm
    Jul 19, 2012 at 2:55 am
  • Team, Is there a way to increase to Number of Reducer . In Map reduce Program . I had increased in configuration mapred.tasktracker.reduce.tasks.maximum = 2 . Is there a way where we can increase in ...
    Syed katherSyed kather
    Jul 18, 2012 at 4:16 pm
    Jul 18, 2012 at 4:53 pm
  • Hi In CDH security guide it is mentioned that "Important Remember that the user who launches the job must exist on every node." But actually I am successfully able to submit job as user without being ...
    Nishan ShettyNishan Shetty
    Jul 5, 2012 at 1:44 pm
    Jul 18, 2012 at 2:03 am
  • I would like to create a hierarchy of output files based on the keys passed to the reducer. The first folder level is the first few digits of the key, the next level is the next few, etc. I had ...
    Berry, MattBerry, Matt
    Jul 17, 2012 at 9:06 pm
    Jul 17, 2012 at 9:10 pm
  • HI all, I have written a MyCombineFileInputFormat extends from CombineFileInputFormat , it can put multi files together into the same inputsplit, it works fine for just a small amount of files. But ...
    Jul 16, 2012 at 2:22 pm
    Jul 16, 2012 at 2:24 pm
  • Hi, It is possible to write to a HDFS datanode w/o relying on Namenode, i.e. to find the location of Datanodes from somewhere else ? Thanks, Robert
    Grandl RobertGrandl Robert
    Jul 11, 2012 at 9:07 pm
    Jul 12, 2012 at 5:46 am
  • Hi all, How do you kill a job or application when using mapred V2 yarn ? I tried the following : Could not find job job_1341398677537_0020 I tried the application id, but it is invalid. I'm using ...
    Benoit MathieuBenoit Mathieu
    Jul 4, 2012 at 2:58 pm
    Jul 9, 2012 at 1:54 am
  • Hi, That command does not show a presently running job (I am tailing the jobtracker log so yes it certainly is still running). Actually no jobs are displayed. So what is needed to sync up "hadoop job ...
    Stephen BoeschStephen Boesch
    Jul 5, 2012 at 7:56 am
    Jul 9, 2012 at 1:51 am
  • Hi , I'm not able to generate additional field as sequence number for my code.Is there any possible solution???????
    Avnish pundirAvnish pundir
    Jul 6, 2012 at 11:08 am
    Jul 6, 2012 at 1:42 pm
  • Hi, I am working on building a map reduce pipeline of jobs(with one MR job's output feeding to another as input). The values being passed around are fairly complex, in that there are lists of ...
    Guruprasad DVGuruprasad DV
    Jul 3, 2012 at 7:56 am
    Jul 3, 2012 at 9:13 am
  • Another lab has a cluster with about a thousand nodes. I have been using eight of their nodes for some hadoop development. Recently my group was offered the use of the entire cluster at times. They ...
    Steve LewisSteve Lewis
    Jul 30, 2012 at 3:33 pm
    Jul 30, 2012 at 3:33 pm
  • Hi , I am very much interested to know how to implement the custom Partitioner . Is there any blog let me know . As i knew the number of reducer is depends upon the partitioner . Correct me if i am ...
    Syed katherSyed kather
    Jul 23, 2012 at 5:53 pm
    Jul 23, 2012 at 5:53 pm
  • Hello, I implemented a very simple custom WritableComparable which works fine as is but I wanted to improve performance by implementing the appropriate WritableComparator as well. The comparator does ...
    David KochDavid Koch
    Jul 23, 2012 at 5:09 pm
    Jul 23, 2012 at 5:09 pm
Group Navigation
period‹ prev | Jul 2012 | next ›
Group Overview
groupmapreduce-user @

67 users for July 2012

Harsh J: 40 posts Manoj Babu: 14 posts Arun C Murthy: 10 posts Mohammad Tariq: 10 posts Grandl Robert: 9 posts Berry, Matt: 8 posts Dave Shine: 7 posts Karthik Kambatla: 7 posts Shanu Sushmita: 6 posts Sriram Ramachandrasekaran: 6 posts Bejoy KS: 5 posts Connell, Chuck: 5 posts GUOJUN Zhu: 4 posts Kasi Subrahmanyam: 4 posts Steve Sonnenberg: 4 posts Subir S: 4 posts Syed kather: 4 posts Andreas Reiter: 3 posts Jason Yang: 3 posts Mapred Learn: 3 posts
show more