FAQ

Search Discussions

218 discussions - 885 posts

  • Hi. I've noticed that hadoop spawns parallell copies of the same task on different hosts. I've understood that this is due to improve the performance of the job by prioritizing fast running tasks. ...
    Marcus HerouMarcus Herou
    Jul 2, 2009 at 9:11 am
    Jul 7, 2009 at 7:41 pm
  • Hi All, I want to know why do we generally use tmp directory(and not any other) for storing hdfs data, knowing the fact that tmp directory is used for storing only temporary data? I was wondering ...
    Akhil1988Akhil1988
    Jul 17, 2009 at 7:56 pm
    Jul 28, 2009 at 7:17 pm
  • Hi, Thanks in advance for the help! I have a performance question relating to how fast I can expect Hadoop to scale. Running Cloudera's 0.18.3-10. I have custom binary format, which is just Google ...
    William kinneyWilliam kinney
    Jul 28, 2009 at 4:19 pm
    Aug 1, 2009 at 1:35 am
  • Hi all, We figured out that anyone who have configured their local hadoop with remote cluster hadoop details and having user name as hadoop can get administrative rights of the cluster. For example, ...
    Palleti, PallaviPalleti, Pallavi
    Jul 23, 2009 at 4:49 am
    Jul 31, 2009 at 5:22 am
  • I'm having problems dealing with my server mfgr atm. Is there a good mfgr to go with? Any advice is helpful, thanks. -Ryan
    Ryan SmithRyan Smith
    Jul 14, 2009 at 9:12 pm
    Jul 22, 2009 at 3:11 pm
  • Dear all, I am trying to set up a cluster of two machines using Hadoop. One machine is both namenode and jobtracker, the other machine is the datanode and tasktracker. I set up passwd-less ssh both ...
    Boyu ZhangBoyu Zhang
    Jul 15, 2009 at 7:57 pm
    Jul 17, 2009 at 7:43 pm
  • Hi. We're running a small 30 node cluster and in a few days will reinstall the whole software, thus I want to change HDD configuration that was done long time ago and seems to be inefficient - each ...
    Dmitry PushkarevDmitry Pushkarev
    Jul 13, 2009 at 6:51 pm
    Jul 20, 2009 at 7:49 pm
  • Hi Group, Finally I have written a sample Mapred program, submitted this job to Hadoop and got the expected results. Thanks to all of you! Now I don't have an idea of how to use Hadoop in real life ...
    Shravan MahankaliShravan Mahankali
    Jul 6, 2009 at 12:26 pm
    Jul 10, 2009 at 5:46 am
  • i using hadoop storage my media files,, but when the number of documents when more than one million, Hadoop start about 10-20 minutes, my datanode automatically down, namenode log shows that the loss ...
    MingyangMingyang
    Jul 17, 2009 at 1:11 pm
    Jul 22, 2009 at 2:07 am
  • Hi all, I'd like to remind everyone that RSVP is open for the next monthly Bay Area Hadoop user group organized by Yahoo!. Agenda and registration available here ...
    Dekel TankelDekel Tankel
    Jul 31, 2009 at 10:50 pm
    Mar 3, 2010 at 6:08 pm
  • I'm looking to build up data for an RDF Data store using Hadoop. I could just generate lots of RDF XML files - or a big one - and feed it into Apache Jena.. However it seems to me that it would be ...
    Alex McLintockAlex McLintock
    Jul 2, 2009 at 12:33 pm
    Jul 3, 2009 at 6:04 am
  • Hi, I'm a student working with Apache Mahout for the Google Summer of Code. We recently moved to 0.20.0, and I was porting my code to the new API. Unfortunately, I (and the whole project team) seem ...
    David HallDavid Hall
    Jul 22, 2009 at 9:48 pm
    Jul 23, 2009 at 8:38 pm
  • I run ant clean then : ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1 Below is the error output. Is this the right way to build libhdfs? 0.18.3 and 0.19.1 builds libhdfs for me just fine. ...
    Ryan SmithRyan Smith
    Jul 14, 2009 at 6:37 pm
    Jul 14, 2009 at 8:20 pm
  • Hello, I am a bit confused about the local directories where each map/reduce task can store data. According to what I have read, dfs.data.dir - is the path on the local file system in which the ...
    BonitoBonito
    Jul 1, 2009 at 3:56 pm
    Jul 1, 2009 at 10:04 pm
  • Dear All, I have a question in my mind about HDFS and I cannot find the answer from the documents on the apache website. I have a cluster of 4 machines, one is the namenode and the other 3 are ...
    Boyu ZhangBoyu Zhang
    Jul 23, 2009 at 7:16 pm
    Jul 24, 2009 at 6:33 pm
  • Hi, everyone I'm a beginner of hadoop. I notice it from the web console after I've tried to run serveral jobs. Every one of the jobs has the number of Spilled Records equal to Map output records, ...
    Mu QiaoMu Qiao
    Jul 12, 2009 at 10:55 am
    Jul 15, 2009 at 4:00 am
  • Hi . I am a beginner to the Hadoop Map/Reduce Framework. Is there a way I can access the static variables declared in my class in the map function ? It goes like : public class test extends ...
    SmarthrishSmarthrish
    Jul 9, 2009 at 6:05 pm
    Jul 11, 2009 at 5:29 am
  • Hello, I'm new to hadoop, apologies if this is a repeat question. I'm trying to understand how we can implement a MapReduce job for what needs to be an ordered/sorted data set (where we require a ...
    Paul BPaul B
    Jul 7, 2009 at 9:35 pm
    Jul 9, 2009 at 6:50 am
  • Hi, I have a few zip files as input, they reside in one directory on HDFS. I want each node to take a zip file and work on it. Specifically, I want to take the zip files and write the binary contents ...
    Mark KerznerMark Kerzner
    Jul 7, 2009 at 3:28 am
    Jul 7, 2009 at 5:17 pm
  • Hi, take for example this line bin/hadoop jar /usr/joe/wordcount.jar org.myorg.WordCount /usr/joe/wordcount/input /usr/joe/wordcount/output How do I give additional jars? Thank you, Mark
    Mark KerznerMark Kerzner
    Jul 28, 2009 at 6:19 pm
    Jul 29, 2009 at 4:03 am
  • Hi, my output consists of a number of binary files, corresponding text files, and one descriptor file. Is there a way to for my reducer to produce a zip of all binary files, another zip of all text ...
    Mark KerznerMark Kerzner
    Jul 23, 2009 at 12:53 am
    Jul 28, 2009 at 4:51 pm
  • I was wondering if someone could give me some answers or maybe some pointers where to look in the code. All these questions are in the same vein of hard drive failure. Question 1: If a master (system ...
    Ryan SmithRyan Smith
    Jul 23, 2009 at 7:14 pm
    Jul 24, 2009 at 6:02 pm
  • Hi all, We have buildup a system which use hadoop MapRdeuce to sort and index the input files. The index is straightforward blocked-key= files+offsets. Then we can query the dataset with low lentacy. ...
    ZsongboZsongbo
    Jul 7, 2009 at 2:03 am
    Jul 22, 2009 at 9:00 pm
  • I have been told that it is not a good idea to keep HDFS files open for a long time. The reason sounded like a memory leak in the name node - that over time, the resources absorbed by an open file ...
    David B. RitchDavid B. Ritch
    Jul 2, 2009 at 12:22 pm
    Jul 21, 2009 at 6:25 pm
  • Hello, I am running Hadoop on my 4 nodes system. Initially, I pick the replication factor of 2, and nearly 100% of map tasks run in local up to 3 nodes, but the ratio drops to 80% if I use all 4 ...
    Seunghwa KangSeunghwa Kang
    Jul 17, 2009 at 10:57 pm
    Jul 18, 2009 at 12:08 am
  • Hi, I am considering to implement a Partitioner that needs to access the parameters in Configuration of job. However, there is no straightforward way for this task. Are there any suggestions? Thanks, ...
    Jianmin WooJianmin Woo
    Jul 14, 2009 at 9:27 am
    Jul 15, 2009 at 8:49 am
  • Hi many times I want to sort by value instead of key. For instance when counting the top used tags in blog posts or the ten most visited pages on a certain site and so on. Wondering if that is even ...
    Marcus HerouMarcus Herou
    Jul 9, 2009 at 2:52 pm
    Jul 9, 2009 at 8:18 pm
  • Dear All: The NameNode is the Single Point of hadoop, I want to know how to HA the NameNode. imcaptor
    ImcaptorImcaptor
    Jul 30, 2009 at 8:27 am
    Aug 12, 2009 at 9:02 pm
  • So.. I want to have different memory profiles for NameNode/DataNode/JobTracker/TaskTracker. But it looks like I only have one environment variable to modify, HADOOP_HEAPSIZE, but I might be running ...
    Fernando PadillaFernando Padilla
    Jul 22, 2009 at 3:39 am
    Jul 24, 2009 at 2:28 am
  • Hi All, Just like method configure in Mapper interface, I am looking for its counterpart that will perform the closing operation for a Map task. For example, in method configure I start an external ...
    Akhil1988Akhil1988
    Jul 13, 2009 at 6:31 pm
    Jul 15, 2009 at 4:03 am
  • Hello, I wonder how did the Yahoo! developers generate the Task Timeline figures in their "Hadoop Sorts a Petabyte..." blog post: ...
    Rares VernicaRares Vernica
    Jul 22, 2009 at 3:21 pm
    Aug 14, 2009 at 3:16 am
  • Hi guys, I have a set of 1000 gzipped plain text files. How to read them in Hadoop? Is there any built-in class available for it? Btw, I'm using hadoop-0.18.3. Regards, Prashant.
    Prashant ullegaddiPrashant ullegaddi
    Jul 31, 2009 at 3:02 pm
    Jul 31, 2009 at 5:06 pm
  • Hi, I set the number of reducers to 1, and I indeed get only one output file, /output/part-00000. However, in configure() and in close() I do a System.out, and I see that these are called three ...
    Mark KerznerMark Kerzner
    Jul 29, 2009 at 4:58 am
    Jul 31, 2009 at 12:49 am
  • Hi, In the hadoop documentation it says that all key-value classes need to implement Writable to allow serialization and de-serialization of outputs between mappers and reducers. Is this also ...
    Devajyoti SarkarDevajyoti Sarkar
    Jul 28, 2009 at 6:24 pm
    Jul 29, 2009 at 6:50 pm
  • I have a question about how Amazon Elastic MapReduce handles persistent content stored in S3. I'm interested in using AEMR, but I'm concerned about latency introduced by copying content from S3 into ...
    Larry ComptonLarry Compton
    Jul 17, 2009 at 6:12 pm
    Jul 23, 2009 at 12:46 am
  • hi, all I see there are two read in DFSInputStream: int read(byte buf[], int off, int len) int read(long position, byte[] buffer, int offset, int length) And I use the following code test the read ...
    Martin MituzasMartin Mituzas
    Jul 20, 2009 at 7:40 am
    Jul 21, 2009 at 7:04 am
  • Hi, I was trying out a map-reduce example using JobControl. i create a jobConf conf1 object, add the necessary information then i create a job object Job job1 = new Job(conf1); n thn i delare ...
    Rakhi KhatwaniRakhi Khatwani
    Jul 17, 2009 at 8:10 am
    Jul 20, 2009 at 3:58 pm
  • Hello. I would like to ask if there is any 'scenario' in which the reduce side join is preferable than the map-side join. One may claim that map side join requires preprocessing of the input sources. ...
    BonitoBonito
    Jul 14, 2009 at 8:49 pm
    Jul 17, 2009 at 7:03 am
  • Hello, Is it possible to have more than one reducer in standalone mode? I am currently using 0.17.2.1 and I do: job.setNumReduceTasks(4); before starting the job and it seems that Hadoop overrides ...
    Rares VernicaRares Vernica
    Jul 12, 2009 at 7:21 pm
    Jul 14, 2009 at 12:45 pm
  • Hi Aaron, I don't get the meaning of your better solution. Could you tell me how to "squared away" the /etc/hosts files. I also meet the same problem in my hadoop. it is very strange that the problem ...
    Ian jonhsonIan jonhson
    Jul 6, 2009 at 9:07 am
    Jul 13, 2009 at 10:14 pm
  • Hi Jothi, We are trying to index around 245GB compressed data (~1TB uncompressed) on a 9 node Hadoop cluster with 8 slaves and 1 master. In Map, we are just parsing the files, passing the same to ...
    Prashant UllegaddiPrashant Ullegaddi
    Jul 9, 2009 at 7:31 pm
    Jul 11, 2009 at 7:07 pm
  • Hi All, Is there a recommended way on how to extract data from HDFS and perform some computations on the data in order to display the results on a webpage. One thing that comes to my mind is to write ...
    Usman WaheedUsman Waheed
    Jul 8, 2009 at 10:27 pm
    Jul 9, 2009 at 6:26 am
  • Hi. How can the mapper task know this is the last map() call? Do I need to intervene within the Hadoop framework? where? Thanks, - Uri
    Uri ShaniUri Shani
    Jul 6, 2009 at 11:40 am
    Jul 6, 2009 at 8:46 pm
  • Dear Hadoop devs, Please help me to figure out a way to program the following problem using Hadoop. I have a program which I need to invoke in parallel using Hadoop. The program takes an input ...
    Jaliya EkanayakeJaliya Ekanayake
    Jul 31, 2009 at 6:54 am
    Aug 21, 2009 at 4:22 am
  • Hello again! Yes, I know some of us are still recovering from OSCON. It's time for another delicious meetup to chat about Hadoop, HBase, Solr, Lucene, and more! UW is quite a pain for us to access ...
    Bradford StephensBradford Stephens
    Jul 27, 2009 at 7:16 pm
    Aug 5, 2009 at 5:38 pm
  • I'm trying to tie Hadoop into Cacti (following the steps mentioned on http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/doc/INSTALL.txt) and have two issues: 1. After all's done, images having ...
    Amandeep KhuranaAmandeep Khurana
    Jul 30, 2009 at 11:47 pm
    Jul 31, 2009 at 6:31 pm
  • Hi, I have data in PCAP file format (packet capture for network trafficc). Is it possible to process this file in Hadoop in same format ? Or any supporting tool over hadoop to analyze data from PCAP ...
    Wasim BariWasim Bari
    Jul 28, 2009 at 3:58 pm
    Jul 31, 2009 at 6:05 am
  • Hello,I try to use DBInputFormat with database sql server 2000 and got error "incorrect syntax near LIMIT". Does hadoop support sql server 2000 ? Thanks
    Po poPo po
    Jul 28, 2009 at 4:23 am
    Jul 28, 2009 at 4:19 pm
  • Hi, I'm using Hadoop 0.20.0 (semidistributed mode, or whatever it's called -- I can't look up the name, since the documentation on the site seems to be down), and I'm experiencing a JobTracker crash ...
    Mathias De MaréMathias De Maré
    Jul 17, 2009 at 9:31 am
    Jul 27, 2009 at 9:02 am
  • Hi, I am very new with hadoop and I'm hoping someone can help me do a two column sort. For my input, I have lines with 3 colunns. I would like to sort the first column by string ascending and the ...
    David_caDavid_ca
    Jul 16, 2009 at 5:42 pm
    Jul 17, 2009 at 6:05 am
Group Navigation
period‹ prev | Jul 2009 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions218
posts885
users212
websitehadoop.apache.org...
irc#hadoop

212 users for July 2009

Ted Dunning: 53 posts Jason Venner: 42 posts Todd Lipcon: 36 posts Aaron Kimball: 30 posts Mark Kerzner: 29 posts Steve Loughran: 23 posts Marcus Herou: 21 posts Owen O'Malley: 16 posts Akhil1988: 15 posts Boyu Zhang: 14 posts Ryan Smith: 14 posts Amandeep Khurana: 13 posts Divij Durve: 11 posts Harish Mallipeddi: 11 posts Palleti, Pallavi: 11 posts Scott Carey: 11 posts William kinney: 11 posts Alex Loddengaard: 10 posts Sugandha Naolekar: 10 posts Amareshwari Sriramadasu: 9 posts
show more