FAQ

Search Discussions

143 discussions - 542 posts

  • hello everyone, As I don't have experience with big scale cluster, I cannot figure out why the inter-rack communication in a mapreduce job is "significantly" slower than intra-rack. I saw cisco ...
    Elton skyElton sky
    Jun 6, 2011 at 7:23 am
    Jun 7, 2011 at 11:23 am
  • Folks, I've been digging into the potential benefits of using 10 Gigabit Ethernet (10GbE) NIC server connections for Hadoop and wanted to run what I've come up with through initial research by the ...
    Saqib Jang -- Margalla CommunicationsSaqib Jang -- Margalla Communications
    Jun 28, 2011 at 5:17 pm
    Jul 11, 2011 at 2:21 pm
  • Hi all, Are there anywhere instructions on how to change from the default ports of Hadoop and HDFS? My main interest is in default port 8020. Thanks, George --
    George KousiourisGeorge Kousiouris
    Jun 2, 2011 at 2:23 pm
    Jun 3, 2011 at 12:38 pm
  • Helloo.. My namenode is running with the following exceptions and going to safemode everytime its trying to start the datanodes.. why so ? I deleted all the files in the HDFS.. and ran it again..!! ...
    Praveenesh kumarPraveenesh kumar
    Jun 7, 2011 at 9:50 am
    Jun 7, 2011 at 7:32 pm
  • Hello, I'm working with an application to calculate the temperatures of a squared board. I divide the board in a mesh, and represent the board as a list of (key, value) pairs with a key being the ...
    Alberto AndreottiAlberto Andreotti
    Jun 21, 2011 at 3:04 pm
    Jun 23, 2011 at 1:21 pm
  • Hi, I have two logs which should have all the records for the same record_id, in other words, if this record_id is found in the first log, it should also be found in the second one. However, I ...
    Mark KerznerMark Kerzner
    Jun 26, 2011 at 4:40 am
    Jun 27, 2011 at 8:00 am
  • Hi All, Is append to an existing file is now supported in Hadoop for production clusters? If yes, please let me know which version and how Thanks Jagaran
    Jagaran dasJagaran das
    Jun 13, 2011 at 6:08 pm
    Jun 22, 2011 at 2:48 am
  • Hi, We have a requirement where There would be huge number of small files to be pushed to hdfs and then use pig to do analysis. To get around the classic "Small File Issue" we merge the files and ...
    Jagaran dasJagaran das
    Jun 17, 2011 at 12:34 am
    Jun 21, 2011 at 10:11 am
  • Hello, I'm quite new to Hadoop, so I'd like to get an understanding of something. Lets say I have a task that requires 16gb of memory, in order to execute. Lets say hypothetically it's some sort of ...
    Ian UprightIan Upright
    Jun 15, 2011 at 8:51 pm
    Jun 17, 2011 at 1:14 am
  • I'm trying to run a MapReduce task against a cluster of 4 DataNodes with 4 cores each. My input data is 4GB in size and it's split into 100MB files. Current configuration is default so block size is ...
    Juan P.Juan P.
    Jun 27, 2011 at 7:50 pm
    Jun 28, 2011 at 6:48 pm
  • Guys, I was using hadoop eclipse plugin on hadoop 0.20.2 cluster.. It was working fine for me. I was using Eclipse SDK Helios 3.6.2 with the plugin "hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar" ...
    Praveenesh kumarPraveenesh kumar
    Jun 22, 2011 at 5:56 am
    Jun 23, 2011 at 4:19 am
  • I want/need to upgrade my namenode/secondary node hardware. Actually also acts as one of the datanodes. Could not find any how-to guides. So what is the process to switch from one hardware to the ...
    MilleBiiMilleBii
    Jun 14, 2011 at 9:01 pm
    Jun 20, 2011 at 6:25 am
  • Hi, I see hadoop would need unix (on windows with Cygwin) to run. It would be much nice if Hadoop gets away from the shell scripts though appropriate ant scripts or with java Admin Console kind of ...
    Raja Nagendra KumarRaja Nagendra Kumar
    Jun 12, 2011 at 2:01 am
    Jun 14, 2011 at 11:00 am
  • Hi Everyone: I am quite new to hadoop here. I am attempting to set up Hadoop locally in two machines, connected by LAN. Both of them pass the single-node test. However, I failed in two-node cluster ...
    Jingwei LuJingwei Lu
    Jun 27, 2011 at 8:24 pm
    Jun 28, 2011 at 2:31 pm
  • Hi, We are a start-up company has been using the Hadoop Cluster platform (version 0.20.2) on Amazon EC2 environment. We tried to setup a cluster using two different forms: Cluster 1: includes 1 ...
    אבי ווקניןאבי ווקנין
    Jun 20, 2011 at 4:59 pm
    Jun 21, 2011 at 11:58 am
  • Hi, I am new to hadoop (Just 1 month old). These are the steps I followed to install and run hadoop-0.20.203.0: 1) Downloaded tar file from ...
    RuteshRutesh
    Jun 15, 2011 at 4:54 pm
    Jun 17, 2011 at 5:23 am
  • I am having a problem starting my hadoop. i setup my for my single-node cluster. please help me to solve this out!!!! hadoop@ashishpc:~$ /usr/local/hadoop/bin/start-all.sh starting namenode, logging ...
    Ashish TamrakarAshish Tamrakar
    Jun 30, 2011 at 6:48 pm
    Jul 1, 2011 at 9:01 am
  • Hi Folks, In the hadoop-env.sh, we find, ... # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs is there any reason this location could not be a ...
    Jack CraigJack Craig
    Jun 22, 2011 at 8:02 pm
    Jun 25, 2011 at 6:07 pm
  • I want to extract the key-value pairs from a MapWritable, cast them into Integer (key) and Double (value) types, and add them to another collection. I'm attempting the following but this code is ...
    Dhruv KumarDhruv Kumar
    Jun 21, 2011 at 8:14 pm
    Jun 21, 2011 at 8:42 pm
  • Dear all, I ran several map-reduce jobs in Hadoop Cluster of 4 nodes. Now this time I want a map-reduce job to be run again after one. Fore.g to clear my point, suppose a wordcount is run on ...
    Adarsh SharmaAdarsh Sharma
    Jun 2, 2011 at 11:18 am
    Jun 21, 2011 at 11:15 am
  • Dear all, I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores around 30 million files. Here's some info: 1 x namenode: 32GB RAM, 24GB ...
    SiuonSiuon
    Jun 10, 2011 at 11:33 am
    Jun 13, 2011 at 4:13 pm
  • Hi, I'm trying to read the inputSplit over and over using following function in MapperRunner: @Override public void run(RecordReader input, OutputCollector output, Reporter reporter) throws ...
    Mark questionMark question
    Jun 8, 2011 at 5:55 am
    Jun 8, 2011 at 9:20 pm
  • I'm fancied about passing a whole ruby app to streaming, so I don't need to bother with ruby file dependencies. For example, ./streaming ... -mapper 'ruby aaa/bbb/ccc' -files aaa <--- pass the folder ...
    Guang-Nan ChengGuang-Nan Cheng
    Jun 28, 2011 at 4:20 pm
    Apr 10, 2012 at 9:59 pm
  • Hello, I have problems in using rumen in hadoop-0.21.0 Firstly the log files used by rumen are named as : job_201105301222_0007_username but it said "WARN rumen.TraceBuilder: File skipped: Invalid ...
    FujizawaMiyukiFujizawaMiyuki
    Jun 15, 2011 at 6:10 pm
    Dec 23, 2011 at 11:47 am
  • Hi, Wordcount example which shipped with hadoop, expects the output directory not exists and if one exists (due to previous session), it throws the error like this.. ...
    Raja Nagendra KumarRaja Nagendra Kumar
    Jun 20, 2011 at 11:12 am
    Jun 20, 2011 at 3:42 pm
  • Hi guys, I'm going through Hadoop The Definitive Guide trying to understand how to use DistributedCache (0.20.2) to make a configuration file available to my Mapper in every node of the cluster. The ...
    Juan P.Juan P.
    Jun 6, 2011 at 8:49 pm
    Jun 9, 2011 at 1:50 pm
  • Hi, Not able to see my email in the mail archive..So sending it again...!!! Guys.. need your feedback..!! Thanks, Praveenesh ---------- Forwarded message ---------- From: praveenesh kumar ...
    Praveenesh kumarPraveenesh kumar
    Jun 6, 2011 at 8:49 am
    Jun 6, 2011 at 3:46 pm
  • Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ The data node has ...
    MilleBiiMilleBii
    Jun 1, 2011 at 7:29 pm
    Jun 2, 2011 at 9:28 am
  • We use hadoop/hdfs to archive data. I archive a lot of file by creating one large tar file and then placing to hdfs. Is it better to use hadoop archive for this or is it essentially the same thing? ...
    RitaRita
    Jun 27, 2011 at 10:07 am
    Jul 7, 2011 at 1:52 am
  • Hi, I am using Apache Whirr to setup an Hadoop cluster on EC2 using Hadoop 0.22.0 SNAPSHOTs (nightly) builds from Jenkins. For details, see [1,2]. (Is there a better place where I can get nightly ...
    Paolo CastagnaPaolo Castagna
    Jun 30, 2011 at 6:54 am
    Jul 3, 2011 at 2:17 am
  • Hi all, I'd like to select random N records from a large amount of data using hadoop, just wonder how can I archive this ? Currently my idea is that let each mapper task select N / mapper_number ...
    Jeff ZhangJeff Zhang
    Jun 27, 2011 at 7:12 am
    Jun 27, 2011 at 8:02 pm
  • Hi, Is queue-like structure supported from HDFS where stream of data is processed when it's generated? Specifically, I will have stream of data coming; and data independent operation needs to be ...
    Saumitra ShahapureSaumitra Shahapure
    Jun 24, 2011 at 5:12 pm
    Jun 27, 2011 at 11:13 am
  • Hi, I have been attempting to set up a single node Hadoop cluster (by following http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/) on my personal computer ...
    Ziyad MirZiyad Mir
    Jun 20, 2011 at 11:24 pm
    Jun 25, 2011 at 3:37 am
  • Hi All, I'm a bit confused about the values displayed on the 'jobtracker.jsp' page. In particular, there's a section called 'Cluster Summary'. I'm running a small 4-machine Hadoop cluster, and when I ...
    Ken WilliamsKen Williams
    Jun 1, 2011 at 11:42 am
    Jun 22, 2011 at 11:41 pm
  • Hello, I'm trying to run the example from the quick start guide on Windows and I get this error: $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' Exception in thread "main" ...
    Drew GrossDrew Gross
    Jun 16, 2011 at 5:33 am
    Jun 22, 2011 at 5:57 pm
  • By using context,getTaskAttemptID().getTaskID(), we can have the task id of the reducer but this code does not work for partitioner , So can anyone tell me how to get the Reducer id in partitioner. I ...
    SomnathSomnath
    Jun 14, 2011 at 5:22 pm
    Jun 21, 2011 at 10:31 am
  • I'm trying to set up hadoop on Fedora 15, I have set JAVA_HOME (ie configured linux to see my java installation) and i have altered hadoop-env.sh so that JAVA_HOME is set as my jdk folder, but when i ...
    J3VSJ3VS
    Jun 11, 2011 at 1:20 pm
    Jun 13, 2011 at 4:15 pm
  • Hey guys, Really trying to get our namenode back up and running after a full disk error last night. I've freed up a lot of space, however the NameNode still fails to startup: 2011-06-12 10:26:09,042 ...
    Ryan LeCompteRyan LeCompte
    Jun 12, 2011 at 2:29 pm
    Jun 12, 2011 at 4:10 pm
  • Environment: Mac 10.6.x. Hadoop version: hadoop-0.20.2-cdh3u0 Is there any good reference/link that provides configuration of additional data-nodes on a single machine (in pseudo distributed mode). ...
    Kumar KandasamiKumar Kandasami
    Jun 10, 2011 at 3:34 am
    Jun 11, 2011 at 6:46 am
  • Hi, I am wondering is there any built-in function to automatically add a self-increment line number in reducer output (like the relation DB auto-key). I have this problem because in 0.19.2 API, I ...
    Shi YuShi Yu
    Jun 7, 2011 at 4:21 pm
    Jun 10, 2011 at 4:47 pm
  • Hi, We're interested in using HDFS to store several large file sets to be available for download from our customers in the following paradigm: Customer <- | APPSERVER-CLUSTER {app1,app2,app3} | <- | ...
    Joe GreenawaltJoe Greenawalt
    Jun 3, 2011 at 8:56 pm
    Jun 7, 2011 at 7:31 pm
  • Hello, I wanted to know if anyone has any tips or tutorials on howto install the hadoop cluster on multiple datacenters Do you need ssh connectivity between the nodes across these data centers? ...
    Sanjeev TaranSanjeev Taran
    Jun 7, 2011 at 5:08 am
    Jun 7, 2011 at 12:58 pm
  • Hi , I have my source code written in 0.19.1 Hadoop API and want to shift it to newer API 0.20.20. Any clue on good documentation on migrating from older version to newer version will be very ...
    Prashant SharmaPrashant Sharma
    Jun 28, 2011 at 12:13 pm
    Jul 25, 2011 at 6:32 am
  • Hi, Currently i am running Hadoop-0.19.1.I want to migrate from Hadoop-0.19.1 version to Hadoop-0.20.2.Can any one suggest me how to go ahead.The main tasks are API migration and DFS migration.Thanks ...
    Rajesh puttaRajesh putta
    Jun 29, 2011 at 5:03 am
    Jun 29, 2011 at 5:21 am
  • (-general@, +common-user@ -- Please use general@ only for project wide discussions) User jobs do not need visibility of the java processes to submit jobs. Specifically, are you facing any issues ...
    Harsh JHarsh J
    Jun 24, 2011 at 5:09 am
    Jun 26, 2011 at 7:32 am
  • Hi, My specific question is: is it possible to control the split of Lzo files by customize the Lzo index files? The background of the problem is: I have a file which has the following format key1 ...
    Shi YuShi Yu
    Jun 23, 2011 at 8:59 pm
    Jun 24, 2011 at 7:01 am
  • Why don't you call them "directors" and "workers" instead of "masters" and "slaves" ? Mark
    Mark HedgesMark Hedges
    Jun 13, 2011 at 12:26 am
    Jun 21, 2011 at 10:48 am
  • Is there a way to insert lucene indexes into hbase table during the indexing stage. Also whether it helps to run lucene in multi node environment. -- View this message in context: ...
    RsriramtceRsriramtce
    Jun 18, 2011 at 7:07 pm
    Jun 20, 2011 at 6:57 pm
  • I heard a HDFS file as a producer - consumer queue, a file can be used as a queue? I am very confused
    LtomunoLtomuno
    Jun 13, 2011 at 6:56 am
    Jun 13, 2011 at 2:53 pm
  • Hello, What is the most optimal way to compress several files already in hadoop ?
    Madhu RamannaMadhu Ramanna
    Jun 10, 2011 at 11:24 pm
    Jun 12, 2011 at 5:57 pm
Group Navigation
period‹ prev | Jun 2011 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions143
posts542
users169
websitehadoop.apache.org...
irc#hadoop

169 users for June 2011

Harsh J: 32 posts Shi Yu: 20 posts Jagaran das: 19 posts Praveenesh kumar: 19 posts Madhu phatak: 17 posts Steve Loughran: 15 posts Mark question: 14 posts Joey Echeverria: 12 posts MilleBii: 12 posts GOEKE, MATTHEW (AG/1000): 11 posts Bharath Mundlapudi: 10 posts George Kousiouris: 9 posts John Armstrong: 9 posts Michel Segel: 9 posts Kumar Kandasami: 8 posts Jeff Schmitz: 7 posts Alberto Andreotti: 7 posts Darren Govoni: 7 posts Eric Charles: 7 posts Mark Kerzner: 7 posts
show more