FAQ

Search Discussions

72 discussions - 306 posts

  • Hi, Can Hadoop run Map/Reduce directly on files in a local file system and would this make sense? Seems like there is a tradeoff to be made when you have to process lots and lots of little files. The ...
    MfcMfc
    Aug 26, 2007 at 3:22 pm
    Sep 2, 2007 at 4:13 am
  • Hi, first of all, thanks for Hadoop. It's amazing how much you can get done with a small hadoop job. My setup is a little bit different from the usual. I have a mid-sized Opteron machine with the ...
    Thorsten SchuettThorsten Schuett
    Aug 18, 2007 at 11:18 am
    Aug 24, 2007 at 2:33 pm
  • Hello All: I think I must be missing something fundamental. Is it possible to load compressed data into HDFS, and then operate on it directly with map/reduce? I see a lot of stuff in the docs about ...
    C GC G
    Aug 30, 2007 at 3:23 pm
    Sep 4, 2007 at 5:24 pm
  • Hi folks, Would be grateful if someone can help us understand why our secondary namenodes don't seem to be doing anything: 1. running 0.13.0 2. secondary namenode logs continuously spew: at ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Aug 24, 2007 at 1:33 am
    Sep 5, 2007 at 5:47 am
  • I am finding that it is a common pattern that multi-phase map-reduce programs I need to write very often have nearly degenerate map functions in second and later map-reduce phases. The only need for ...
    Ted DunningTed Dunning
    Aug 22, 2007 at 5:56 pm
    Oct 27, 2007 at 12:22 am
  • Hi All: I tried 0.14.0 today with limited success. 0.13.0 was doing pretty well, but I'm not able to get as far with 0.14.0. My environment is single-node, 4way box, 8G memory, 500G disk space. First ...
    C GC G
    Aug 23, 2007 at 11:12 pm
    Aug 25, 2007 at 3:05 pm
  • Few queries regarding the way data is loaded into HDFS. -Is it a common practice to load the data into HDFS only through the master node ? We are able to copy only around 35 logs (64K each) per ...
    Venkates .P.B.Venkates .P.B.
    Aug 1, 2007 at 1:10 pm
    Aug 7, 2007 at 7:17 pm
  • Specifically, how can we express this query: Table1 contains: id, (list of ids) Table2 contains: id, f1 Where the Table1:list is a variable length list of foreign key (id) into Table2. We would like ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Aug 27, 2007 at 8:59 pm
    Aug 29, 2007 at 7:14 pm
  • Dear All, Hi, my name is Taeho and I am trying to figure out the maximum number of files a namenode can hold. The main reason for doing this is that I want to have some estimates on how many files I ...
    Taeho KangTaeho Kang
    Aug 28, 2007 at 6:59 am
    Aug 29, 2007 at 8:21 am
  • When I read the Hadoop documentation: The Hadoop Distributed File System: Architecture and Design (http://lucene.apache.org/hadoop/hdfs_design.html) a paragraph hold my attention: “Moving Computation ...
    Samuel LEMOINESamuel LEMOINE
    Aug 23, 2007 at 9:18 am
    Aug 24, 2007 at 7:46 am
  • Hi I have a map/reduce job that uses external jar files. How do I specify those jars in the classpath when submitting the mapred job using ./hadoop jar .... ? Suppose my map job relies on API in some ...
    PhantomPhantom
    Aug 13, 2007 at 11:49 pm
    Aug 14, 2007 at 5:39 pm
  • Hi all, We just moved to the 0.14.0 distribution of hadoop. Until now, we were running the 0.10.1 one. Important point : the client submitting jobs is on a total different machine from the master and ...
    Thomas FriolThomas Friol
    Aug 23, 2007 at 2:58 pm
    Aug 27, 2007 at 11:45 am
  • Anyone have any pointers debugging why an odd HDFS close is failing? Here is the exception I'm getting. 2007-08-22 21:45:21,459 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call ...
    Michael StackMichael Stack
    Aug 22, 2007 at 10:19 pm
    Aug 22, 2007 at 11:27 pm
  • Hi, I have a hadoop application where each run of the map could potentially generate large amount of key value pairs, so it caused the run of memory error. I am wondering if there is a way to inform ...
    Eric ZhangEric Zhang
    Aug 20, 2007 at 7:32 pm
    Aug 21, 2007 at 6:27 pm
  • Hi folks, We had a weird thing where one of our data nodes was 100% disk full (all hdfs data) and the other nodes were uniformly 20% space utilizated. Just wondering if this is a bug or whether we ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Aug 17, 2007 at 10:24 pm
    Aug 18, 2007 at 12:25 am
  • Hi, In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master with the Job tracker. Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the corresponding property ...
    Mahajan, NeerajMahajan, Neeraj
    Aug 17, 2007 at 6:47 pm
    Aug 17, 2007 at 8:24 pm
  • I have a fairly simple job with a map, a local combiner and a reduce. The combiner and the reduce do the equivalent of a group_concat (mysql). I have horrible performance in the reduce stage: - the ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Aug 3, 2007 at 7:18 pm
    Aug 8, 2007 at 6:38 pm
  • Hi everyone ! I'm still trying to understand the way hadoop works, and the possibilities offered in parallelizing java applications with haddop (especially lucene-based ones). For the moment, I've ...
    Samuel LEMOINESamuel LEMOINE
    Aug 1, 2007 at 12:15 pm
    Aug 3, 2007 at 7:37 am
  • Hello: According the Map/Reduce working flow, it runs map first, then reduce. Could I run this pairs for several iterations(i.e. to run map/reduce several times and stop for certain criteria, like ...
    ChaoChun LiangChaoChun Liang
    Aug 21, 2007 at 7:34 am
    Aug 21, 2007 at 4:12 pm
  • Hi all, I'm trying to translate this simple query to PigLatin but i remain stuck in the ordering. Given this table : (user,item) 12 145 13 192 12 145 12 133 13 164 13 192 12 145 i want to run this : ...
    Eric PalaciosEric Palacios
    Aug 9, 2007 at 4:37 pm
    Aug 13, 2007 at 11:34 pm
  • Hi When I write data into HDFS do I always need to connect to the datanode to write the data ? Can I connect to any namenode to do so ? If so how does the datanode keep track of where the various ...
    PhantomPhantom
    Aug 7, 2007 at 9:59 pm
    Aug 7, 2007 at 10:22 pm
  • Hi, Page 3 of the "Hacking Pig" documentation suggests it is possible to LOAD a file, do stuff with it and then STORE it out... Is this correct or does it have to be done in Java? Regards, Shane PS. ...
    Shane ButlerShane Butler
    Aug 3, 2007 at 4:12 am
    Aug 6, 2007 at 8:24 pm
  • Hi, we are having problems with hbase scripts. Basically, when we run the stop script, it's not able to kill gracefully the HMaster (instead, it forks another HMaster that after a few time dies). The ...
    Michele CatastaMichele Catasta
    Aug 27, 2007 at 2:45 pm
    Aug 28, 2007 at 11:09 pm
  • Does anyone have any ideas on this issue? Otherwise, if I were to write a patch to add this option for jobs to Hadoop, would it be useful for anyone else? Thanks Stu -----Original Message----- From: ...
    Stu HoodStu Hood
    Aug 28, 2007 at 8:27 pm
    Aug 28, 2007 at 8:54 pm
  • Hello, We had text processing libraries, which developed by C/C++, and would like to use Hadoop to process large data set. It looks like not much talking about this issue (to call C/C++ library under ...
    ChaoChun LiangChaoChun Liang
    Aug 16, 2007 at 7:43 am
    Aug 27, 2007 at 11:22 pm
  • Hello everybody! I've been trying to use hadoop distributed file system from my java spring web application but without any good results :). We have one server where hadoop namenode are datanode are ...
    Jani ArvonenJani Arvonen
    Aug 21, 2007 at 12:59 pm
    Aug 22, 2007 at 3:45 pm
  • Hi, I am new to Hadoop. Looking at the documentation, I figured out how to write map and reduce functions but now I'm stuck... How do we work with the output file produced by the reducer? For ...
    Sebastien RainvilleSebastien Rainville
    Aug 14, 2007 at 3:09 pm
    Aug 15, 2007 at 4:18 pm
  • To solve the checksum errors on the non-ecc memory machines, I modified some codes in DFSClient.java and DataNode.java. The idea is very simple. The original CHUNK structure is {chunk size}{chunk ...
    Daeseong KimDaeseong Kim
    Aug 14, 2007 at 3:06 am
    Aug 14, 2007 at 4:31 pm
  • Hi, In reduce phase, with outputValueGroupingComparator, we can sort all keys and then group values of a particular key together and send it to reduce() method. Is there a way to sort values of a ...
    Novice userNovice user
    Aug 7, 2007 at 5:12 am
    Aug 8, 2007 at 10:23 pm
  • I tried to create a job that set the # of Tasks to 900k and it hung and then ended up killing it's self. The 900k should have been split over 300 machines but it never took. When I reduced the task # ...
    Derek GottfridDerek Gottfrid
    Aug 31, 2007 at 7:14 pm
    Sep 5, 2007 at 3:51 am
  • Isn't that what the distcp script does? Thanks, Stu -----Original Message----- From: Joydeep Sen Sarma Sent: Friday, August 31, 2007 3:58pm To: hadoop-user@lucene.apache.org Subject: Re: Compression ...
    Stu HoodStu Hood
    Aug 31, 2007 at 9:23 pm
    Sep 1, 2007 at 5:49 am
  • Can nobody help me? Why does hadoop write in /root/? I do think with the folders in hadoop configuration files, I can set all needed folders. kind regards Frank
    Otto, FrankOtto, Frank
    Aug 24, 2007 at 12:16 pm
    Aug 27, 2007 at 6:54 am
  • Hi, I followed the command, which described in README.txt under ~/src/examples/pipes to execute wordcount-simple. % bin/hadoop pipes -conf src/examples/pipes/conf/word.xml -input in-dir -output ...
    ChaoChun LiangChaoChun Liang
    Aug 22, 2007 at 3:14 am
    Aug 23, 2007 at 5:10 pm
  • Hi, I need to serve a large number of audio/video files from web server. The file size is in the range of few MB for each file. These files will be generated by a background process. Each file is ...
    Manoj BistManoj Bist
    Aug 19, 2007 at 10:34 pm
    Aug 19, 2007 at 11:56 pm
  • Hi, We're seeing some of our map tasks hanging indefinitely during execution, and I just wanted to check if somebody maybe had seen similar things (to figure out whether things are a hadoop problem ...
    Eyal OrenEyal Oren
    Aug 14, 2007 at 7:48 pm
    Aug 15, 2007 at 10:02 am
  • I have been using hadoop for my work and it was working well till yesterday. Suddenly since yesterday, I started getting below error when I ran start-all.sh. Jobtracker is failing to start with the ...
    Novice userNovice user
    Aug 6, 2007 at 5:25 am
    Aug 6, 2007 at 5:46 am
  • Hi I'm trying to pass .gz files as input to hadoop, and at the end of mapreduce, the number of input records read from the input files is around 480, and when I uncompress the files, the number of ...
    Sandhya ESandhya E
    Aug 2, 2007 at 6:36 am
    Aug 3, 2007 at 5:38 pm
  • Hi all, We are using Hadoop streaming to utilize our existing codes. We have successfully run our code on single node. Now comes the problem: how can the mapper and reducer modules be distributed ...
    Yiping HanYiping Han
    Aug 29, 2007 at 11:55 pm
    Aug 30, 2007 at 4:54 am
  • Hello everyone, I found few documents about hadoop webapp. Could someone show me the way? thank you!
    Wu zhi huaWu zhi hua
    Aug 29, 2007 at 11:51 am
    Aug 29, 2007 at 5:12 pm
  • hello everyone, when I download the "hadoop-2007-08-16_16-45-34.tar.gz", then run the command " ant -Dcompile.c++=yes examples", the console display the error: Buildfile: build.xml init: [mkdir] ...
    Wu zhi huaWu zhi hua
    Aug 17, 2007 at 4:14 pm
    Aug 29, 2007 at 11:44 am
  • Hi folks, I am a little puzzled by (what looks to me) is like records that I am emitting from my combiner - but that are not showing up under 'combine output records' (and seem to be disappearing). ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Aug 22, 2007 at 12:30 am
    Aug 22, 2007 at 1:06 am
  • I am wondering what the most efficient way would be handle the following scenario with map reduce in hadoop. Let's say we have the following data time=1, ip=1, a=1 time=2, ip=2, a=2 time=3, ip=2, b=4 ...
    Torsten CurdtTorsten Curdt
    Aug 20, 2007 at 8:55 pm
    Aug 20, 2007 at 11:31 pm
  • hi all. i have a job where my map will be transforming files and throwing out malformed records, etc. Another step in this job is to perform lookups based on certain fields in the records. Think ...
    Jason gessnerJason gessner
    Aug 18, 2007 at 3:54 pm
    Aug 20, 2007 at 4:24 am
  • The wiki page http://wiki.apache.org/lucene-hadoop/HowToConfigure implies that mapred-default.xml is read for the dfs configuration, as well as for mapreduce jobs. But this doesn't appear to be true ...
    Michael BieniosekMichael Bieniosek
    Aug 16, 2007 at 8:32 pm
    Aug 16, 2007 at 8:56 pm
  • Hi My Reduce jobs do not write any data to disk but fire off a network call to an RPC server with the data. However all reduce jobs are getting killed with the following the error message : Task ...
    PhantomPhantom
    Aug 16, 2007 at 2:37 pm
    Aug 16, 2007 at 6:02 pm
  • Hi all I'm in trouble with ObjectWritable. I'm trying to implement a simple indexation with Lucene & Hadoop, and for that I take inspiration from nutch code. In the Indexer.java of nutch, line 245, I ...
    Samuel LEMOINESamuel LEMOINE
    Aug 16, 2007 at 8:38 am
    Aug 16, 2007 at 3:29 pm
  • I want to partition my data by using R reducer tasks to produce R reduce output files, and each reduce task also writes a binary file for the corresponding partition on DFS. Is there an easy way to ...
    Hao ZhengHao Zheng
    Aug 6, 2007 at 3:29 am
    Aug 8, 2007 at 10:09 pm
  • hey guys, my data nodes are logging this exception occasionally. i'm using hadoop-0.12.3. 2007-08-06 09:20:00,185 ERROR org.apache.hadoop.dfs.DataNode: DataXCeiver java.io.IOException: Unknown opcode ...
    Moonwatcher32329Moonwatcher32329
    Aug 6, 2007 at 4:35 pm
    Aug 6, 2007 at 7:30 pm
  • 1. Where can i find documentation on how to set a DFS cluster (multiple nodes) ? 2. Is there a DFS client (i want to transport the file from box Y which is not in the DFS cluster to the Cluster for ...
    S dS d
    Aug 6, 2007 at 7:57 am
    Aug 6, 2007 at 8:59 am
  • getting permision denied error when configure Hadoop plugin for eclipse (IBM Plugin). Looks like does not connect to Hadoop server. Any thoughts? thanks dt www.ejinz.com
    DmitryDmitry
    Aug 4, 2007 at 8:34 am
    Aug 6, 2007 at 6:05 am
Group Navigation
period‹ prev | Aug 2007 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions72
posts306
users91
websitehadoop.apache.org...
irc#hadoop

91 users for August 2007

Ted Dunning: 29 posts Joydeep Sen Sarma: 20 posts Doug Cutting: 16 posts Arun C Murthy: 12 posts Raghu Angadi: 12 posts Owen O'Malley: 10 posts C G: 9 posts Mfc: 8 posts ChaoChun Liang: 7 posts Michael Stack: 7 posts Samuel LEMOINE: 7 posts Dennis Kubes: 6 posts Stu Hood: 6 posts Dhruba Borthakur: 5 posts Eric Baldeschwieler: 5 posts Konstantin Shvachko: 5 posts Michael Bieniosek: 5 posts Phantom: 5 posts Thorsten Schuett: 5 posts Enis Soztutar: 4 posts
show more