FAQ

Search Discussions

167 discussions - 657 posts

  • Hi, I am looking into using Hadoop streaming to parallelize some simple programs. So far the performance has been pretty disappointing. The cluster contains 5 nodes. Each node has two CPU cores. The ...
    LinLin
    Mar 31, 2008 at 8:01 pm
    Apr 3, 2008 at 8:22 pm
  • Hi, Wasn't there going to be a live stream from the Hadoop summit? I couldn't find any references on the event site/page, and searches on veoh, youtube and google video yielded nothing. Is an ...
    Otis GospodneticOtis Gospodnetic
    Mar 26, 2008 at 3:03 am
    May 15, 2008 at 6:58 am
  • Hi everyone, I ran a distributed system that consists of 50 spiders/crawlers and 8 server nodes with a Hadoop DFS cluster with 8 datanodes and a namenode... Each spider has 5 job processing / data ...
    André MartinAndré Martin
    Mar 21, 2008 at 9:25 pm
    Apr 1, 2008 at 12:15 pm
  • I noticed when reading http://wiki.apache.org/hadoop/HardwareBenchmarks the following comment: "I ran into some odd behavior on Herd2 where if i [ . . . ] the reducers don't start until the mappers ...
    Marc HarrisMarc Harris
    Mar 3, 2008 at 7:26 pm
    Mar 11, 2008 at 6:19 pm
  • HI, I am developing the simple inverted index program frm the hadoop. My map function has the output: <word, doc and the reducer has: <word, list(docs) Now I want to use one more mapreduce to remove ...
    Aayush GargAayush Garg
    Mar 26, 2008 at 4:39 pm
    Jun 14, 2012 at 6:13 am
  • Hi, I've seen in http://wiki.apache.org/nutch-data/attachments/Presentations/attachments/oscon05.pdf(slide 12) that Nutch has extensions to MapReduce. I wanted to ask whether these are part of the ...
    Naama KrausNaama Kraus
    Mar 6, 2008 at 12:58 pm
    Mar 9, 2008 at 7:06 am
  • Hello, I'm just starting to dig into Hadoop and testing it's feasibility for large scale development work. I was wondering if anyone else being affected by these issues using hadoop 0.16.0? I ...
    Holden RobbinsHolden Robbins
    Mar 1, 2008 at 7:19 pm
    Mar 3, 2008 at 7:11 pm
  • I have some confusion over the use of Amazon S3 as storage. I was looking at the fs.default.name as the name node -- a host and a port the client uses to ask the name node to perform DFS services. ...
    Steve SapovitsSteve Sapovits
    Mar 1, 2008 at 5:11 pm
    Mar 2, 2008 at 6:38 pm
  • I've got 2 datanodes setup with the following configuration parameter: <property <name dfs.datanode.du.reserved</name <value 429496729600</value <description Reserved space in bytes per volume. ...
    Jimmy WanJimmy Wan
    Mar 6, 2008 at 5:57 pm
    Mar 12, 2008 at 8:48 pm
  • Hi all, i m new to hadoop and i wanted to know how to dynamically add a slave to my cluster, obviously while it' s running. Thanks in advance, John -- View this message in context: ...
    TjohnTjohn
    Mar 9, 2008 at 4:54 pm
    Mar 10, 2008 at 4:17 pm
  • Hi I want to copy 1000 files (37GB) of data to the dfs. I have a set up of 9-10 nodes, each one has between 5 to 15GB of free space. While coping the files from the local file system on nodeA, the ...
    Alfonso Olias SanzAlfonso Olias Sanz
    Mar 24, 2008 at 2:36 pm
    Mar 26, 2008 at 6:25 pm
  • Hi, What is the best/right way to handle partitioning of the final job output (i.e. output of reduce tasks)? In my case, I am processing logs whose entries include dates (e.g. "2008-03-01 foo bar ...
    Otis GospodneticOtis Gospodnetic
    Mar 18, 2008 at 11:36 pm
    Mar 20, 2008 at 11:11 pm
  • Hi, I'd be interested in information about interfaces to HDFS other then the DFSShell commands. I've seen threads about dfs and fuse, dfs and WebDav. Could anyone provide more details or point me to ...
    Naama KrausNaama Kraus
    Mar 11, 2008 at 7:17 am
    Mar 13, 2008 at 12:46 am
  • I have found that storing each column in its own gzip file can really speed up processing time on arbitrary subsets of columns. For example suppose I have two CSV files called csv_file1.gz and ...
    Richard K. TurnerRichard K. Turner
    Mar 10, 2008 at 6:02 pm
    Mar 11, 2008 at 7:36 pm
  • hi colleagues, I have set up the single node cluster to test pipes examples. wordcount-simple and wordcount-part work just fine. but wordcount-nopipe can't run. Here is my commnad line: bin/hadoop ...
    11 Nov.11 Nov.
    Mar 3, 2008 at 4:38 pm
    Jun 19, 2009 at 1:54 am
  • Hi, I have been unsuccessfully trying to set the map output value class different to the one reduce outputs (in 0.16.0). AFAIK the following should do the trick: ...
    Chang HuChang Hu
    Mar 24, 2008 at 8:23 pm
    Mar 24, 2008 at 10:58 pm
  • Have been working my way through the Map-Reduce tutorial. Just got the WordCount example working. One thing that concerns me is the time it took to run. 11 seconds is the fastest it's been able to ...
    Jason RennieJason Rennie
    Mar 11, 2008 at 8:43 pm
    Mar 12, 2008 at 4:14 pm
  • Hi there. After reading a bit of the hadoop framework and trying the WordCount example. I have several doubts about how to use map /reduce with binary files. In my case binary files are generated in ...
    Alfonso Olias SanzAlfonso Olias Sanz
    Mar 17, 2008 at 1:20 pm
    Mar 19, 2008 at 5:58 pm
  • Hi friends, I have made a cluster of 3 machines, one of them is master, and other 2 slaves. I executed a mapreduce job on master but after Map, the execution terminates and Reduce doesn't happen. I ...
    Ved PrakashVed Prakash
    Mar 10, 2008 at 5:10 am
    Mar 18, 2008 at 7:49 am
  • Hi, i have a question regarding the file permissions. I have a kind of workflow where i submit a job from my laptop to a remote hadoop cluster. After the job finished i do some file operations on the ...
    Johannes ZillmannJohannes Zillmann
    Mar 13, 2008 at 12:48 am
    Mar 17, 2008 at 6:56 pm
  • Dear colleagues, I have a questions on HBase's index implementation. How does the HBase find the data according to a row key? Use a index like database, or use a hash function? I suppose that a hash ...
    Bin YANGBin YANG
    Mar 6, 2008 at 8:40 am
    Mar 6, 2008 at 7:09 pm
  • Hi All, I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have ...
    Prasan AryPrasan Ary
    Mar 20, 2008 at 5:16 pm
    Mar 20, 2008 at 7:23 pm
  • hey all I have about 40 jobs in a batch i'm running. but consistently one particular mr job hangs at the tail of the copy or at the beginning of the sort (it 'looks' like it's still copying, but it ...
    Chris K WenselChris K Wensel
    Mar 13, 2008 at 6:22 pm
    Mar 14, 2008 at 5:13 pm
  • Hi everybody, I'm trying to compile fuse-dfs but I have problems. I don't have a lot of experience with C++. I would like to know: Is it a clear readme file with the instructions to compile, install ...
    Xavier QuintunaXavier Quintuna
    Mar 10, 2008 at 10:23 pm
    Mar 11, 2008 at 6:31 pm
  • Hi Guys, I am having problems creating clusters on 2 machines Machine configuration : Master : OS: Fedora core 7 hadoop-0.15.2 hadoop-site.xml listing <configuration <property <name ...
    Ved PrakashVed Prakash
    Mar 5, 2008 at 6:51 am
    Mar 7, 2008 at 11:48 am
  • Hello, Are there any Hadoop documentation resources showing how to run the current version of Hbase on a single node? Thanks, Peter W.
    Peter W.Peter W.
    Mar 17, 2008 at 7:05 pm
    May 9, 2008 at 3:29 pm
  • Hello, I'm working with Hadoop 0.16.1. I have an issue with the DFS. Sometimes when writing to the HDFS it gets blocked. Sometimes it doesn't happen, so it's not easily reproducible. My cluster have ...
    Iván de PradoIván de Prado
    Mar 28, 2008 at 1:09 pm
    Apr 2, 2008 at 11:41 am
  • Hi, I have small Hadoop cluster, one master and three slaves. When I try the example wordcount on one of our log file (size ~350 MB) Map runs fine but reduce always hangs (sometime around 19%,60% ...
    Natarajan, SenthilNatarajan, Senthil
    Mar 27, 2008 at 2:11 pm
    Mar 31, 2008 at 1:40 am
  • I am running hadoop on EC2. I want to run a jar MR application on EC2 such that input and output files are on S3. I configured hadoop-site.xml so that fs.default.name property points to my s3 bucket ...
    Prasan AryPrasan Ary
    Mar 25, 2008 at 8:07 pm
    Mar 26, 2008 at 10:42 pm
  • I have two questions: - I was wondering if an HDFS client can be invoked from a Flash application. - What are the available APIs for HDFS? (I read that there is a C/C++ api for Hadoop Map/Reduce but ...
    Cagdas GeredeCagdas Gerede
    Mar 18, 2008 at 5:54 pm
    Mar 24, 2008 at 2:35 pm
  • Hello, I am working on developing my first hadoop app from scratch. It is a Monte-Carlo simulation, and I am using the PiEstimator code from the examples as a reference. I believe I have what I want ...
    Stephen J. BarrStephen J. Barr
    Mar 22, 2008 at 1:34 am
    Mar 22, 2008 at 5:19 pm
  • setting up a simple hadoop cluster with two machines, i've gotten to the point where the two machines can see each other, things seem fine, but i'm trying to set up the master as both a master and a ...
    Colin FreasColin Freas
    Mar 21, 2008 at 5:40 pm
    Mar 21, 2008 at 9:18 pm
  • Hi All, I am writing a java implementation for my map/reduce function on hadoop. Input to this is a xml file, and the map function has to process a well formed xml records. So far I have been unable ...
    Prasan AryPrasan Ary
    Mar 3, 2008 at 11:31 pm
    Mar 19, 2008 at 8:24 am
  • I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1, if a map or reduce task threw a runtime exception such as an NPE, the task, and ultimately the job, would fail in short order. ...
    Matt KentMatt Kent
    Mar 17, 2008 at 10:14 pm
    Mar 18, 2008 at 5:01 am
  • Hi, Is it possible to configure hadoop cluster in such manner where there are separately data-nodes and separately worker-nodes? I.e. when nodes 1,2,3 store data in HDFS and nodes 3,4 and 5 do the ...
    Andrey PankovAndrey Pankov
    Mar 13, 2008 at 8:42 am
    Mar 14, 2008 at 7:12 pm
  • Hi, In our system, we plan to upload data into Hadoop from external sources and use it later on for analysis tasks. The interface to the external repositories allows us to fetch pieces of data in ...
    Naama KrausNaama Kraus
    Mar 10, 2008 at 8:58 am
    Mar 11, 2008 at 6:26 pm
  • Hi, I need to identify from which file, a key came from, in the map phase. Is it possible ? What I have is multiple types of log files in one directory that I need to process for my application. ...
    Tarandeep SinghTarandeep Singh
    Mar 5, 2008 at 1:39 am
    Mar 6, 2008 at 2:15 am
  • Hello, I'm working on large amount of logs, and I've noticed that the distribution of data on the network (./hadoop dfs -put input input) takes a lot of time. Let's says that my data is already ...
    Jean-PierreJean-Pierre
    Mar 27, 2008 at 7:43 pm
    Mar 28, 2008 at 3:54 pm
  • Hi, we're looking for options for creating a scalable storage solution based on commodity hardware for media files (spacewise dominated video files of a few hundred MB but also to store up to a few ...
    Robert KrügerRobert Krüger
    Mar 27, 2008 at 9:14 am
    Mar 27, 2008 at 8:22 pm
  • Hi, I am using hdfsWrite to write data onto a file. Whenever I close the file and re open it for writing it will start writing from the position 0 (rewriting the old data). Is there any way to append ...
    Raghavendra KRaghavendra K
    Mar 27, 2008 at 6:29 am
    Mar 27, 2008 at 5:35 pm
  • Hi, we're currently evaluating the use of Hadoop's HDFS for a project where most of the data will be large files and latency will not matter that much, so it should be suited perfectly in those ...
    Robert KrügerRobert Krüger
    Mar 22, 2008 at 5:33 pm
    Mar 24, 2008 at 3:14 pm
  • i'm working to set up a cluster across several machines where users' home dirs are on an nfs mount. i setup key authentication for the hadoop user, install all the software on one node, get ...
    Colin FreasColin Freas
    Mar 21, 2008 at 4:25 pm
    Mar 22, 2008 at 11:27 am
  • The properties mentioned here: http://wiki.apache.org/hadoop/FAQ#13 have been deprecated in favor of two separate properties: mapred.tasktracker.map.tasks.maximum ...
    Jimmy WanJimmy Wan
    Mar 18, 2008 at 11:44 pm
    Mar 20, 2008 at 11:45 pm
  • Hello, I have these two machines that acts as a client to HDFS. Node #1 has Trash option enabled (e.g. fs.trash.interval set to 60) and Node #2 has Trash option off (e.g. fs.trash.interval set to 0) ...
    Taeho KangTaeho Kang
    Mar 19, 2008 at 10:13 am
    Mar 20, 2008 at 4:34 pm
  • Hi! I'm trying to run a streaming job on Hadoop 1.16.0, I've distributed the scripts to be used to all nodes: time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper ...
    Andreas KostyrkaAndreas Kostyrka
    Mar 18, 2008 at 9:18 pm
    Mar 19, 2008 at 10:04 am
  • Hey gang, I know that map/reduce functions will accept any subclass of Writable as input values if the Mapper/Reducer classes are declared as <WritableComparable, Writable, ... . Then you can use ...
    Stu HoodStu Hood
    Mar 16, 2008 at 9:55 pm
    Mar 17, 2008 at 5:09 pm
  • Hi I have just started using hadoop and HDFS. I have done the WordCount test application which gets some input files, process the files, and generates and output file. I have a similar application, ...
    Alfonso Olias SanzAlfonso Olias Sanz
    Mar 15, 2008 at 12:06 am
    Mar 17, 2008 at 4:02 pm
  • I have a question. As we know, the name node forms a single point of failure. In a production environment, I imagine a name node would run in a data center. If that data center fails, how would you a ...
    Cagdas GeredeCagdas Gerede
    Mar 13, 2008 at 8:51 pm
    Mar 13, 2008 at 9:27 pm
  • I have a very large xml file as input and a couple of Map/Reduce functions. Input key/value pair to all of my map functions is the same. I was wondering if there is a way that I read the input xml ...
    Prasan AryPrasan Ary
    Mar 12, 2008 at 4:25 pm
    Mar 12, 2008 at 6:14 pm
  • I have seen examples to connect to hbase using php, which mentions of hshellconnect.class.php, I would like to know where can I download this file, or is there any alternative way to connect to hbase ...
    Ved PrakashVed Prakash
    Mar 11, 2008 at 11:20 am
    Mar 11, 2008 at 6:57 pm
Group Navigation
period‹ prev | Mar 2008 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions167
posts657
users161
websitehadoop.apache.org...
irc#hadoop

161 users for March 2008

Ted Dunning: 56 posts Amar Kamat: 21 posts Chris K Wensel: 20 posts Alfonso Olias Sanz: 16 posts Doug Cutting: 13 posts Naama Kraus: 13 posts Owen O'Malley: 13 posts Ved Prakash: 13 posts Cagdas Gerede: 12 posts Prasan Ary: 12 posts André Martin: 10 posts Andrey Pankov: 10 posts Colin Freas: 10 posts dhruba Borthakur: 10 posts Joydeep Sen Sarma: 10 posts Theodore Van Rooy: 10 posts Andreas Kostyrka: 9 posts Arun C Murthy: 9 posts Jason Venner: 9 posts Otis Gospodnetic: 9 posts
show more