Search Discussions

167 discussions - 718 posts

  • HI, Could you please suggest what classes and another better way to achieve this:- I am getting outputcollector in my reduce function as: void reduce(....) { output.collect(key,value); } Here key is ...
    Aayush GargAayush Garg
    Apr 15, 2008 at 10:20 pm
    Apr 17, 2008 at 5:44 pm
  • Greetings, I'm compiling a list of (free/OSS) tools commonly used to administer Linux clusters to help my company transition away from Win solutions. I use Ganglia for monitoring the general stats of ...
    Bradford StephensBradford Stephens
    Apr 29, 2008 at 8:37 pm
    May 2, 2008 at 4:42 pm
  • Hello, We are using Hadoop here at Stony Brook University to power the next-generation text analytics backend for www.textmap.com. We also have an NFS partition that is mounted on all machines of our ...
    Mikhail BautinMikhail Bautin
    Apr 18, 2008 at 10:03 pm
    Apr 22, 2008 at 12:32 pm
  • Hi I have no reduces. I would like to directly write my map results while they are produced after each map has completed to disk. I don't want to collect then write to output. If I wanted to directly ...
    Kayla JayKayla Jay
    Apr 18, 2008 at 12:50 pm
    Apr 18, 2008 at 7:03 pm
  • I tried to set up hadoop with cygwin according to the paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 But I had problems working with dyndns.I created a new host ...
    Prerna ManaktalaPrerna Manaktala
    Apr 15, 2008 at 6:01 pm
    Apr 17, 2008 at 5:38 pm
  • Hello all, Hadoop newbie here, asking: what's the preferred way to handle large (~1 million) collections of small files (10 to 100KB) in which each file is a single "record"? 1. Ignore it, let Hadoop ...
    Stuart SierraStuart Sierra
    Apr 23, 2008 at 3:56 pm
    Apr 28, 2008 at 4:13 pm
  • hello, i tried to track down the problem with wrong counter values. didnt find any information / cases of it. maybe it is a feature i don't understand. the problem is, that in a local installation ...
    Apr 9, 2008 at 8:35 am
    Apr 17, 2008 at 4:22 pm
  • Hi, Can I give a directory (having subdirectories) as input path to Hadoop Map-Reduce Job. I tried, but got error. Can Hadoop recursively traverse the input directory and collect all the file names ...
    Tarandeep SinghTarandeep Singh
    Apr 1, 2008 at 4:15 pm
    Apr 1, 2008 at 9:34 pm
  • Hi, I am running hadoop 0.15.3 on 2 EC2 instances from a public ami ( ami-381df851) . Our input files are on S3. When I try to do a distcp for an Input file from S3 onto hdfs on EC2, the copy fails ...
    Prasan AryPrasan Ary
    Apr 1, 2008 at 8:18 pm
    Apr 4, 2008 at 8:40 am
  • Hello, I'm sorry if a question like this has been asked before, but I was unable to find an answer for this anywhere on google; if it is off-topic, I apologize in advance. I'm trying to look a bit ...
    Leon MergenLeon Mergen
    Apr 24, 2008 at 5:17 pm
    Apr 24, 2008 at 7:38 pm
  • hi, Can I submit a map-reduce job without creating the jar file (and using $HADOOP_HOME/bin/hadoop script). I looked into the hadoop script and it is invoking org.apache.hadoop.util.RunJar class. ...
    Tarandeep SinghTarandeep Singh
    Apr 22, 2008 at 10:19 pm
    Apr 23, 2008 at 4:54 pm
  • Hi all: Is there any tool that can be used to read the SequenceFile or other InputFormat/OutputFormat? Best Wishes! Samuel Guo
    Samuel GuoSamuel Guo
    Apr 16, 2008 at 8:25 am
    Apr 16, 2008 at 2:28 pm
  • Is there a way to run multiple datanodes in the same machine? -- ------------ Best Regards, Cagdas Evren Gerede Home Page: http://cagdasgerede.info
    Cagdas GeredeCagdas Gerede
    Apr 15, 2008 at 6:22 pm
    Apr 15, 2008 at 10:27 pm
  • hi, i would like to use binary input and output data in combination with hadoop streaming. the reason why i want to use binary data is, that parsing text to float seems to consume a big lot of time ...
    John MenzerJohn Menzer
    Apr 7, 2008 at 1:42 pm
    Apr 15, 2008 at 7:33 am
  • Does any one tried to build Hadoop ? Thanks & Regards, Krishna. Meet people who discuss and share your passions. Go to http://in.promos.yahoo.com/groups/bestofyahoo/
    Krishna prasannaKrishna prasanna
    Apr 9, 2008 at 6:17 pm
    Apr 12, 2008 at 2:36 am
  • Hello list, I was unable to access the archives for this list as http://hadoop.apache.org/mail/core-user/ returns 403. I am interested in using HDFS for storage, and for map/reduce only tangentially. ...
    Todd TroxellTodd Troxell
    Apr 10, 2008 at 5:22 am
    Apr 10, 2008 at 8:37 pm
  • Hi! I'm starting to take alook at hadoop and the whole HDFS idea. I'm wondering if it's just fine to update or overwrite a file copied to hadoop? Thanks, Garri
    Garri SantosGarri Santos
    Apr 3, 2008 at 6:40 am
    Apr 4, 2008 at 4:38 pm
  • Hi, I am using Hadoop streaming and I am trying to create a MapReduce that will generate output where a single key is found in a single output part file. Does anyone know how to ensure this ...
    Ashish VenugopalAshish Venugopal
    Apr 2, 2008 at 12:58 am
    Apr 4, 2008 at 12:14 am
  • I tried to set up hadoop with cygwin according to the paper:http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 But I had problems working with dyndns.I created a new host ...
    Prerna ManaktalaPrerna Manaktala
    Apr 15, 2008 at 2:55 pm
    Apr 18, 2008 at 2:05 am
  • Hi all, I wrote a custom key class (implements WritableComparable) and implemented the compareTo() method inside this class. Everything works fine when I run the m/r job with 1 reduce task (via ...
    Harish MallipeddiHarish Mallipeddi
    Apr 11, 2008 at 11:06 am
    Apr 12, 2008 at 6:19 pm
  • Hey all, We've got a job that we're running in both a development environment, and out on EC2. I've been rather displeased with the performance on EC2, and was curious if the results that we've been ...
    Nate CarlsonNate Carlson
    Apr 10, 2008 at 2:07 am
    Apr 11, 2008 at 10:09 pm
  • Hi, I have a 4 node hadoop 0.15.3 cluster. I am using the default config files. I am running a map reduce job to process 40 GB log data. Some reduce tasks are failing with the following errors: 1) ...
    Apurva JadhavApurva Jadhav
    Apr 23, 2008 at 6:41 am
    Apr 24, 2008 at 4:47 pm
  • Is there a way to get the list of files on each datanode? I need to be able to get all the names of the files on a specific datanode? is there a way to do it?
    Shimi KShimi K
    Apr 21, 2008 at 9:12 am
    Apr 21, 2008 at 7:05 pm
  • Hi, I have the reduce output like this. 206475 316475847 3846495 316975 But I want to display like this... 206 475 316 475 847 384 6495 ...
    Natarajan, SenthilNatarajan, Senthil
    Apr 14, 2008 at 5:10 pm
    Apr 15, 2008 at 3:37 pm
  • Hi I have a general purpose input folder that it is used as input in a Map/Reduce task. That folder contains files grouped by names. I want to configure the JobConf in a way I can filter the files ...
    Alfonso Olias SanzAlfonso Olias Sanz
    Apr 11, 2008 at 1:34 pm
    Apr 15, 2008 at 7:51 am
  • Hi, I'm running Hadoop (latest snapshot) on several machines and in our setup namenode and secondarynamenode are on different systems. I see from the logs than secondary namenode regularly ...
    Yuri PradkinYuri Pradkin
    Apr 2, 2008 at 8:32 pm
    Apr 8, 2008 at 11:03 pm
  • Hi All, I have a streaming tool chain written in c++/python that performs some operations on really big text files (gigabytes order); the chain reads files and writes its result to standard output. ...
    Francesco TamberiFrancesco Tamberi
    Apr 4, 2008 at 10:01 am
    Apr 4, 2008 at 9:45 pm
  • Hello all, I need to copy files from my linux file system to HDFS in a java program and not manually. This is the piece of code that I have. try { FileSystem hdfs = FileSystem.get(new ...
    Ajey ShahAjey Shah
    Apr 30, 2008 at 10:01 pm
    May 8, 2008 at 9:26 pm
  • We are using hadoop 0.16 and are seeing a consistent problem: out of memory errors when we have a large # of map tasks. The specifics of what is submitted when we reproduce this: three large jobs: 1. ...
    Lili WuLili Wu
    Apr 30, 2008 at 8:39 pm
    May 1, 2008 at 9:52 pm
  • Currently, Block reports are computed by scanning all the files and folders in the local disk. This happens not only in startup, but also in periodic block reports. What is the intention behind doing ...
    Cagdas GeredeCagdas Gerede
    Apr 30, 2008 at 6:33 am
    May 1, 2008 at 6:23 pm
  • Hello Has anyone had any experience with processing xml files within Hadoop within their maps/reduces? In particular, has anyone used any sort of XQuery/XPath processing within their maps/reduces? ...
    Kayla JayKayla Jay
    Apr 28, 2008 at 4:40 pm
    Apr 29, 2008 at 4:32 pm
  • Dear all Suppose that I have files that have intermediate key values and I want to combine these intermediate keys values with a new MapReduce task. I want this MapReduce task to combine during the ...
    Dina SaidDina Said
    Apr 19, 2008 at 8:55 pm
    Apr 26, 2008 at 6:41 am
  • Dear Hadoop Users, I'm writing to find out what you think about being able to incrementally re-execute a map reduce job. My understanding is that the current framework doesn't support it and I'd like ...
    Shirley CohenShirley Cohen
    Apr 17, 2008 at 12:26 am
    Apr 21, 2008 at 8:20 pm
  • Is it possible to execute a job more than once? I use map reduce when adding a new instance to a hierarchial cluster tree. It finds the least distant node and inserts the new instance as a sibling to ...
    Karl WettinKarl Wettin
    Apr 18, 2008 at 1:00 am
    Apr 18, 2008 at 3:55 pm
  • Hi, I am new to MapReduce. After slightly modifying the example wordcount, to count the IP Address. I have two files part-00000 and part-00001 with the contents something like. IP Add Count 1.2. 5. ...
    Natarajan, SenthilNatarajan, Senthil
    Apr 8, 2008 at 3:37 pm
    Apr 9, 2008 at 2:00 pm
  • Hi, How can I set a list or map to JobConf that I can access in Mapper/Reducer class ? The get/setObject method from Configuration has been deprecated and the documentation says - "A side map of ...
    Tarandeep SinghTarandeep Singh
    Apr 30, 2008 at 11:20 am
    May 2, 2008 at 12:34 pm
  • After trying out Hadoop in a single machine, I decided to run a MapReduce across multiple machines. This is the approach I followed: 1 Master 1 Slave (A doubt here: Can my Master also be used to ...
    Sridhar RamanSridhar Raman
    Apr 23, 2008 at 7:04 am
    May 1, 2008 at 12:51 pm
  • Hi, I want to get the time the JobTracker was started (Not the time an individual job was started). Is there a way from the JobClient? I see you can get a ClusterStatus Object, but it doesn't include ...
    Pete WyckoffPete Wyckoff
    Apr 30, 2008 at 2:51 am
    May 1, 2008 at 9:26 am
  • HI, I am getting following error on start up the hadoop as pseudo distributed:: bin/start-all.sh localhost: starting datanode, logging to ...
    Aayush GargAayush Garg
    Apr 19, 2008 at 1:53 pm
    Apr 23, 2008 at 9:39 pm
  • Hi, I am trying to back up data to S3 from the hdfs using distcp, but it fails complaining of a null bucket. The bucket does exist and I can access it with s3sync from the local filesystem. Can ...
    Apr 17, 2008 at 1:29 am
    Apr 23, 2008 at 10:38 am
  • Good Day, I successfully installed and copy a test file to HDFS. I was wondering if is it possible to directly access the file without getting it out first from the HDFS. Regards, Garri
    Garri SantosGarri Santos
    Apr 17, 2008 at 9:11 am
    Apr 18, 2008 at 2:38 am
  • Hello, Does anyone have any experience adding nodes to a cluster running on EC2? If so, is there some documentation on how to do this? Thanks, -stephen
    Stephen J. BarrStephen J. Barr
    Apr 16, 2008 at 2:03 am
    Apr 16, 2008 at 5:42 am
  • Hello, Does anyone have large Weblink graph ? I want to experiment and benchmark MapReduce with some real dataset. Thanks, With regards, Chaman Singh Verma, Poona, India between 0000-00-00 and ...
    Chaman Singh VermaChaman Singh Verma
    Apr 15, 2008 at 3:30 pm
    Apr 15, 2008 at 4:18 pm
  • Hello, I'm trying to assemble a simple setup of 3 nodes using NFS as Distributed Filesystem. Box A:, this box is either the NFS server and working as a slave node Box B:, ...
    Apr 11, 2008 at 11:40 am
    Apr 11, 2008 at 6:35 pm
  • I am trying to run a Hadoop cluster on Amazon EC2 and backup all the data on Amazon S3 between the runs. I am using Hadoop 0.16.1 on a cluster made up of CentOS 5 images (ami-08f41161). I am able to ...
    Siddhartha ReddySiddhartha Reddy
    Apr 4, 2008 at 8:35 am
    Apr 4, 2008 at 8:00 pm
  • Hi list, If I define a method named configure in a mapper class which try to read a config file before all map tasks start, which class I should choose? A normal FileReader from jdk or another Reader ...
    Jeremy ChowJeremy Chow
    Apr 3, 2008 at 8:23 am
    Apr 4, 2008 at 3:57 pm
  • Hello. Is libhdfs thread-safe? I can run single thread reading/writing HDFS through libhdfs well, but when incrementing number of threads to 2 or above, I received sigsegv error: # # An unexpected ...
    Yingyuan ChengYingyuan Cheng
    Apr 2, 2008 at 8:39 am
    Apr 3, 2008 at 6:18 pm
  • I have a question regarding a mapper task that needs to call 2 binaries. Ideally I would be able to do the following: mymapper.sh ./binary1 | ./binary2 stream -mapper mymapper.sh ... -file binary1 ...
    Ashish VenugopalAshish Venugopal
    Apr 7, 2008 at 9:10 pm
    Jul 9, 2009 at 5:53 am
  • Hi Sorry for my ignorance, but I am trying to understand if I can use Hadoop and Map/Reduce to process video files and images. Encoding and transcoding videos is an example of what I would like to ...
    Roland RabbenRoland Rabben
    Apr 22, 2008 at 8:49 pm
    Apr 23, 2008 at 4:06 pm
  • Hello, I am developing one application with MapReduce and in that whenever some MapTask condition is met, I would like to broadcast to all other MapTask to abort their work. I am not quite sure ...
    Chaman Singh VermaChaman Singh Verma
    Apr 16, 2008 at 3:29 pm
    Apr 17, 2008 at 8:25 am
Group Navigation
period‹ prev | Apr 2008 | next ›
Group Overview
groupcommon-user @

173 users for April 2008

Ted Dunning: 89 posts Aayush Garg: 22 posts Amar Kamat: 21 posts Cagdas Gerede: 16 posts Devaraj Das: 14 posts Prerna Manaktala: 13 posts Andreas Kostyrka: 12 posts Chaman Singh Verma: 12 posts Norbert Burger: 12 posts Owen O'Malley: 12 posts Arun C Murthy: 11 posts dhruba Borthakur: 11 posts Prasan Ary: 11 posts Joydeep Sen Sarma: 10 posts Kayla Jay: 10 posts Natarajan, Senthil: 10 posts Colin Freas: 9 posts Khalil Honsali: 9 posts John Menzer: 8 posts Karl Wettin: 8 posts
show more