FAQ

Search Discussions

175 discussions - 708 posts

  • Hello all, I'm curious to see how many people are using EC2 to execute their Hadoop cluster and map/reduce programs, and how many are using home-grown datacenters. It seems like the 20 node limit ...
    Ryan LeCompteRyan LeCompte
    Sep 2, 2008 at 5:44 am
    Sep 4, 2008 at 1:27 pm
  • Hi all. We all know the importance of NameNode for a cluster, when the NameNode whith all matadata of the DFS is breaked down, the whole of the clustes'data is gone. So, if we can retrive DFS ...
    叶双明叶双明
    Sep 8, 2008 at 2:40 pm
    Sep 16, 2008 at 12:40 am
  • Hello, I observe that scp of data to the namenode is faster than actually putting into dfs (all nodes coming from same switch and have same ethernet cards, homogenous nodes)? I understand that "dfs ...
    Prasad PingaliPrasad Pingali
    Sep 17, 2008 at 12:16 pm
    Sep 19, 2008 at 5:32 am
  • Hi, We are developing a project and we are intend to use Hadoop to handle the processing vast amount of data. But to convince our customers about the using of Hadoop in our project, we must show them ...
    Trinh Tuan CuongTrinh Tuan Cuong
    Sep 24, 2008 at 8:55 am
    Oct 4, 2008 at 3:57 am
  • Hi guys, I am using Hadoop on a EC2 cluster and am trying to send files onto the HDFS from an external machine. It works up to the point where I get this error message : *Waiting to find target node: ...
    Julien NiocheJulien Nioche
    Sep 5, 2008 at 5:40 pm
    May 15, 2009 at 2:27 pm
  • I am trying to use SequenceFiles with LZO compression outside the context of a MapReduce application. However, when I try to use the LZO codec, I get the following errors in the log: 08/09/30 ...
    Nathan MarzNathan Marz
    Sep 30, 2008 at 6:15 pm
    Oct 10, 2008 at 7:32 am
  • I have the same problem on our cluster. It seems the reducer-tasks are using all cpu, long before there's anything to shuffle. I started a profile of the reduce-task. I've attached the profiling ...
    Espen Amble KolstadEspen Amble Kolstad
    Sep 4, 2008 at 1:07 pm
    Sep 11, 2008 at 1:51 am
  • hi.. By setting isSplitable false, we can set 1 file with n records 1 mapper. Is there any way to set 1 complete file per record.. Thanks in advance Chandravadana S This e-mail and any files ...
    ChandraChandra
    Sep 24, 2008 at 9:17 am
    Mar 17, 2009 at 3:22 am
  • Hi We've been thinking of using Hadoop for a decision making system which will analyze telecom-related data from various sources to take certain decisions. The data can be huge, of the order of ...
    Arijit MukherjeeArijit Mukherjee
    Sep 24, 2008 at 8:34 am
    Oct 8, 2008 at 7:35 am
  • Hey all. We've been running into a very annoying problem pretty frequently lately. We'll be running some job, for instance a distcp, and it'll be moving along quite nicely, until all of the sudden, ...
    Bryan DuxburyBryan Duxbury
    Sep 26, 2008 at 11:36 pm
    Oct 5, 2008 at 6:25 am
  • Hi, all! How to manage a large cluster, eg. more than 2000 nodes. How to config hostname and ip, use DNS? How to config slaves, all in slaves file? How to update software in all nodes. Any practice, ...
    叶双明叶双明
    Sep 11, 2008 at 9:16 am
    Sep 16, 2008 at 12:42 pm
  • Hi, Here is a bsic doubt. I found in different documentation it is mentioned that Hadoop is not recommended for online applications. Can anyone please elaborate on the same ? Regards, Sourav ...
    SouravmSouravm
    Sep 12, 2008 at 6:47 pm
    Sep 15, 2008 at 9:50 am
  • Hi. Is it possible to use Hadoop for real-time app, in video processing field? Regards.
    Stas OskinStas Oskin
    Sep 23, 2008 at 7:51 pm
    Oct 20, 2008 at 7:04 pm
  • Hi, I have a serious problem that I'm not sure how to fix. I have two M/R phases that calculates a matrix in parallel. It works... but it's slower than the serial version (by about 100 times). Here ...
    SandySandy
    Sep 19, 2008 at 8:00 pm
    Sep 20, 2008 at 5:29 pm
  • Hi all- One more question. I'm looking for a lightweight way to serve data stored as key-value pairs in a series of MapFiles or SequenceFiles. HBase/Hypertable offer a very robust, powerful solution ...
    Chris DyerChris Dyer
    Sep 18, 2008 at 5:05 am
    Sep 19, 2008 at 7:43 pm
  • We are already using Thrift to move and store our log data and I'm looking onto how I could read the stored log data into MapReduce processes. This article ...
    Juho MäkinenJuho Mäkinen
    Sep 2, 2008 at 7:54 am
    Sep 4, 2008 at 5:28 pm
  • Hi all, I have an application that i use to run with the "hadoop jar" command. I have now written an optimized version of the mapper in C. I have run this using the streaming library and everything ...
    Christian SøttrupChristian Søttrup
    Sep 14, 2008 at 11:13 pm
    Sep 18, 2008 at 9:13 pm
  • i know it is running one datanode in one computer normally。 i wondering can i run multiple datanode in one pc?
    叶双明叶双明
    Sep 4, 2008 at 3:20 am
    Sep 9, 2008 at 1:38 am
  • Hi, There are various technologies on top of Hadoop such as HBase, Hive, Pig and more. I was wondering what are the differences between them. What are the usage scenarios that fit each one of them. ...
    Naama KrausNaama Kraus
    Sep 3, 2008 at 12:05 pm
    Sep 8, 2008 at 4:09 pm
  • Does anyone have har/unhar utility? Or at least format description: It looks pretty obvious though, but just in case. Thanks
    Dmitry PushkarevDmitry Pushkarev
    Sep 3, 2008 at 9:52 am
    Sep 4, 2008 at 10:29 am
  • (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular ...
    David HallDavid Hall
    Sep 21, 2008 at 9:06 pm
    Sep 26, 2008 at 6:09 pm
  • Hi, I'm trying to refine my map reduce algorithm to run faster, but I ran into a little bit of trouble. In my main, I have the following parameters set for my conf: ...
    SandySandy
    Sep 24, 2008 at 12:42 am
    Sep 24, 2008 at 4:10 pm
  • Hi all, I am getting outofmemory error as shown below when I ran map-red on huge amount of data.: java.lang.OutOfMemoryError: Java heap space at ...
    Pallavi PalletiPallavi Palleti
    Sep 17, 2008 at 12:36 pm
    Sep 19, 2008 at 8:42 am
  • Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute ...
    SouravmSouravm
    Sep 16, 2008 at 6:10 am
    Sep 18, 2008 at 6:23 am
  • I'm trying to use JavaSerialization for a series of MapReduce jobs, and when it comes to reading a SequenceFile using SequenceFileInputFormat with JavaSerialized objects, something breaks down. I've ...
    Jason GreyJason Grey
    Sep 16, 2008 at 4:47 pm
    Sep 17, 2008 at 5:49 pm
  • This method's signature is {code} T deserialize(T); {code} But, the RecordReader next method is {code} boolean next(K,V); {code} So, if the deserialize method does not return the same T (i.e., K or ...
    Pete WyckoffPete Wyckoff
    Sep 12, 2008 at 9:03 pm
    Sep 12, 2008 at 10:45 pm
  • Hi, This may be a silly question, but I'm strangely having trouble finding an answer for it (perhaps I'm looking in the wrong places?). Suppose I have a cluster with n nodes each with m processors. I ...
    SandySandy
    Sep 7, 2008 at 6:26 pm
    Sep 10, 2008 at 10:02 pm
  • I'm attempting to load data into hadoop (version 0.17.1), from a non-datanode machine in the cluster. I can run jobs and copyFromLocal works fine, but when i try to use distcp i get the below. I'm ...
    Michael Di DomenicoMichael Di Domenico
    Sep 8, 2008 at 3:58 am
    Sep 9, 2008 at 6:17 pm
  • Hi, I'm trying to build an index using the "index" contrib in Hadoop 0.18.0, but the reduce tasks are consistently failing. In the output from the "hadoop jar" command, I see messages like this: ...
    Joe ShawJoe Shaw
    Sep 25, 2008 at 9:27 pm
    Sep 26, 2008 at 3:25 pm
  • Hi Stefan, Are the slides from the Katta presentation up somewhere? If not then could you please post them? Thanks, Deepika
    Deepika KheraDeepika Khera
    Sep 22, 2008 at 8:34 pm
    Sep 26, 2008 at 1:20 am
  • Hi, I wonder if someone is aware of any measurement of Hadoop scalability with the cluster size, eg., read/write/appends throughput on a cluster of 5,10,30,50,100 nodes, or something alike. These ...
    Guilherme MenezesGuilherme Menezes
    Sep 21, 2008 at 4:40 pm
    Sep 23, 2008 at 8:58 pm
  • Hello to all, I have been attracted by the Hadoop project while looking for a solution for my application. Basically, I have an application hosting user generated content (images, sounds, videos) and ...
    Monchanin EricMonchanin Eric
    Sep 12, 2008 at 9:49 am
    Sep 14, 2008 at 1:05 am
  • Hi all! The NameNode is a Single Point of Failure for the HDFS Cluster. There is support for NameNodeFailover, with a SecondaryNameNode hosted on a separate machine being able to stand in for the ...
    叶双明叶双明
    Sep 6, 2008 at 9:28 am
    Sep 9, 2008 at 12:38 am
  • Can we use something like RAM FS to share static data across map tasks. Scenario, 1) Quadcore machine 2) 2 1-TB Disk 3) 8 GB ram, Now Ii need ~2.7 GB ram per Map process to load some static data in ...
    Amit Kumar SinghAmit Kumar Singh
    Sep 5, 2008 at 8:00 am
    Sep 6, 2008 at 7:05 pm
  • Beginner's question: If I have a cluster with a single node that has a max of 1 map/1 reduce, and the job submitted has 50 maps... Then it will process only 1 map at a time. Does that mean that it's ...
    Ryan LeCompteRyan LeCompte
    Sep 3, 2008 at 4:01 am
    Sep 5, 2008 at 12:49 pm
  • Hello, I'm trying to upload a fairly large file (18GB or so) to my AWS S3 account via bin/hadoop fs -put ... s3://... It copies for a good 15 or 20 minutes, and then eventually errors out with a ...
    Ryan LeCompteRyan LeCompte
    Sep 1, 2008 at 8:32 pm
    Sep 3, 2008 at 1:35 pm
  • Hey all, Why is it that FileSystem.rename returns true or false instead of throwing an exception? It seems incredibly inconvenient to get a false result and then have to go poring over the namenode ...
    Bryan DuxburyBryan Duxbury
    Sep 30, 2008 at 8:38 pm
    Oct 1, 2008 at 2:19 am
  • Hi, I would like to measure the disk i/o performance of our hadoop cluster. However, running iostat on 16 nodes is rather cumbersome. Does dfs keep track of any stats like the number of blocks or ...
    Shirley CohenShirley Cohen
    Sep 29, 2008 at 11:46 pm
    Sep 30, 2008 at 3:18 am
  • Hi list, The default way hadoop doing its sorting is by keys , can it sort by values rather than keys? Regards, Jeremy -- My research interests are distributed systems, parallel computing and ...
    Jeremy ChowJeremy Chow
    Sep 25, 2008 at 2:23 am
    Sep 28, 2008 at 6:35 pm
  • Hi, I want to sort my records ( consisting of string, int, float) using Hadoop. One way I have found is to set number of reducers = 1, but this would mean all the records go to 1 reducer and it won't ...
    Tenaali RamTenaali Ram
    Sep 12, 2008 at 8:59 pm
    Sep 25, 2008 at 3:01 am
  • Hi folks; I have a small cluster, but each node is big- 8 cores each, with lots of IO bandwidth. I'd like to increase the number of simultaneous map and reduce tasks scheduled per node from the ...
    Joel WellingJoel Welling
    Sep 23, 2008 at 6:43 pm
    Sep 23, 2008 at 11:23 pm
  • Hello all, I'd love to be able to upload into HDFS very large files (e.g., 8 or 10GB), but it seems like my only option is to chop up the file into smaller pieces. Otherwise, after a while I get ...
    Ryan LeCompteRyan LeCompte
    Sep 22, 2008 at 3:09 pm
    Sep 23, 2008 at 2:49 pm
  • Hello all, I'm setting up a small 3 node hadoop cluster (1 node for namenode/jobtracker and the other two for datanode/tasktracker). The map tasks finish fine, but the reduce tasks are failing at ...
    Ryan LeCompteRyan LeCompte
    Sep 21, 2008 at 2:08 am
    Sep 22, 2008 at 11:57 pm
  • Hi, I am new to hadoop. For my map/reduce task I want to write my on custom writable class. Could anyone please let me know where exactly to place the customwritable.java file? I found that in ...
    Deepak DiwakarDeepak Diwakar
    Sep 18, 2008 at 12:10 pm
    Sep 19, 2008 at 1:27 am
  • Hello, A strange thing happened in my job. In reduce phase, one of the tasks status shows 101.44% complete and runs till some 102% and successfully finished back to 100%. Is this a right behavior? I ...
    PvvprPvvpr
    Sep 16, 2008 at 7:26 pm
    Sep 17, 2008 at 5:28 am
  • Hi All, I am considering using HDFS for an application that potentially has many small files – ie 10-100 million files with an estimated average filesize of 50-100k (perhaps smaller) and is an online ...
    Peter McTaggartPeter McTaggart
    Sep 15, 2008 at 4:14 am
    Sep 16, 2008 at 3:13 pm
  • Hi, we're running 100 XLarge instances (ec2), with a gig of heap space for each task - and are seeing the following error frequently (but not always): ##### BEGIN PASTE ##### [exec] 08/09/03 11:21:09 ...
    Florian LeibertFlorian Leibert
    Sep 3, 2008 at 4:44 pm
    Sep 10, 2008 at 10:44 pm
  • Hey all Scale Unlimited is putting together some case studies for an upcoming class and wants to get a snapshot of what the Hadoop user community looks like. If you have 2 minutes, please feel free ...
    Chris K WenselChris K Wensel
    Sep 8, 2008 at 5:45 pm
    Sep 10, 2008 at 3:21 pm
  • Hi, I'm running on hadoop-0.18.0. I have a m-r job that executes correctly in standalone mode. However, when run on a cluster, the same job produces zero output. It is very bizarre. I looked in the ...
    Shirley CohenShirley Cohen
    Sep 4, 2008 at 5:07 pm
    Sep 9, 2008 at 7:07 pm
  • Hello, I was wondering if anyone has gotten far at all with getting Hadoop up and running with EC2 + EBS? Any luck getting this to work in a way that the HDFS runs on the EBS so that it isn't blown ...
    Ryan LeCompteRyan LeCompte
    Sep 5, 2008 at 11:00 pm
    Sep 8, 2008 at 5:26 pm
Group Navigation
period‹ prev | Sep 2008 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions175
posts708
users186
websitehadoop.apache.org...
irc#hadoop

186 users for September 2008

Owen O'Malley: 35 posts 叶双明: 35 posts Ryan LeCompte: 32 posts Sandy: 22 posts Raghu Angadi: 20 posts Steve Loughran: 17 posts Dmitry Pushkarev: 16 posts Arun C Murthy: 15 posts Chris Douglas: 15 posts Edward J. Yoon: 12 posts Devaraj Das: 10 posts James Moore: 10 posts Pvvpr: 10 posts Shirley Cohen: 10 posts Dennis Kubes: 9 posts Pete Wyckoff: 9 posts Tom White: 9 posts Jean-Daniel Cryans: 8 posts Shengkai Zhu: 8 posts Souravm: 8 posts
show more