FAQ

Search Discussions

199 discussions - 881 posts

  • Hi. I have 2 questions about HDFS performance: 1) How fast are the read and write operations over network, in Mbps per second? 2) If the chunk server is located on same host as the client, is there ...
    Stas OskinStas Oskin
    Apr 9, 2009 at 10:46 pm
    Jan 17, 2010 at 5:27 pm
  • Hi. I have quite a strange issue, where one of the datanodes that I have, rejects any blocks with error messages. I looked in the datanode logs, and found the following error: 2009-04-21 16:59:19,092 ...
    Stas OskinStas Oskin
    Apr 21, 2009 at 7:22 pm
    Apr 25, 2009 at 3:47 pm
  • Hey all I recently setup a three node hadoop cluster and ran an examples on it. It was pretty fast, and all the three nodes were being used (I checked the log files to make sure that the slaves are ...
    Mithila NagendraMithila Nagendra
    Apr 12, 2009 at 3:40 pm
    Apr 17, 2009 at 6:28 pm
  • FYI Amazons new Hadoop offering: http://aws.amazon.com/elasticmapreduce/ And Cascading 1.0 supports it: http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html cheers, ckw -- Chris K Wensel ...
    Chris K WenselChris K Wensel
    Apr 2, 2009 at 7:48 am
    Apr 21, 2009 at 10:55 am
  • hello hadoop users, Recently I had a chance to lead a team building a log-processing system that uses Hadoop and MySQL. The system's goal was to process the incoming information as quickly as ...
    Ankur GoelAnkur Goel
    Apr 28, 2009 at 10:47 am
    Apr 29, 2009 at 8:23 pm
  • Hi, I want to make experiments with wordcount example in a different way. Suppose we have very large data. Instead of splitting all the data one time, we want to feed some splits in the map-reduce ...
    Aayush GargAayush Garg
    Apr 7, 2009 at 12:36 am
    Apr 16, 2009 at 4:49 pm
  • I'm setting up a Hadoop cluster and I have the name node and job tracker up and running. However, I cannot get any of my datanodes or tasktrackers to start. Here is my hadoop-site.xml file... <?xml ...
    Jpe30Jpe30
    Apr 15, 2009 at 6:40 pm
    Apr 27, 2009 at 3:56 pm
  • Hi all, I am currently processing a lot of raw CSV data and producing a summary text file which I load into mysql. On top of this I have a PHP application to generate tiles for google mapping (sample ...
    Tim robertsonTim robertson
    Apr 14, 2009 at 9:35 am
    Apr 24, 2009 at 3:47 am
  • Hi I am a new Hadoop user. I have a small cluster with 3 Datanodes. In hadoop-site.xml values of dfs.replication property is 2 but then also it is replicating data on 3 machines. Please tell why is ...
    Puri, AseemPuri, Aseem
    Apr 10, 2009 at 4:57 am
    Apr 17, 2009 at 4:27 am
  • Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) ...
    Bradford StephensBradford Stephens
    Apr 16, 2009 at 10:40 pm
    Jun 3, 2009 at 9:36 pm
  • Has anyone seen this before? Our task tracker produced a 2.7 gig log file in a few hours. The entry is all the same (every 2 ms): 2009-04-30 02:34:40,207 INFO org.apache.hadoop.mapred.TaskTracker: ...
    Lance RiedelLance Riedel
    Apr 30, 2009 at 4:46 pm
    May 14, 2009 at 5:02 pm
  • I jumped into Hadoop at the 'deep end'. I know pig, hive, and hbase support the ability to max(). I am writing my own max() over a simple one column dataset. The best solution I came up with was ...
    Edward CaprioloEdward Capriolo
    Apr 18, 2009 at 4:00 pm
    Apr 22, 2009 at 8:01 am
  • (Hadoop is used in the benchmarks) http://database.cs.brown.edu/sigmod09/ There is currently considerable enthusiasm around the MapReduce (MR) paradigm for large-scale data analysis [17]. Although ...
    Guilherme GermoglioGuilherme Germoglio
    Apr 14, 2009 at 2:17 pm
    Apr 21, 2009 at 2:47 pm
  • Hey all, I was doing some research on I/O patterns of our applications, and I noticed the attached pattern. In case if the mail server strips out attachments, I also uploaded it: ...
    Brian BockelmanBrian Bockelman
    Apr 13, 2009 at 1:55 am
    Apr 16, 2009 at 11:54 am
  • Hi. I'm trying to use the API to get the overall used and free spaces. I tried this function getUsed(), but it always returns 0. Any idea? Thanks.
    Stas OskinStas Oskin
    Apr 8, 2009 at 8:13 am
    Jun 23, 2010 at 8:53 pm
  • Hi, We are planning to use hadoop for some very expensive and long running processing tasks. The computing nodes that we plan to use are very heavy in terms of CPU and memory requirement e.g one ...
    Amit handaAmit handa
    Apr 25, 2009 at 5:37 am
    Apr 28, 2009 at 9:32 am
  • Hi, I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 nodes, 8 of them being task trackers. I'm getting the following error and my jobs keep failing when map processes start hitting ...
    Jim TwenskyJim Twensky
    Apr 8, 2009 at 2:22 am
    Apr 22, 2009 at 8:03 am
  • I created a JIRA (https://issues.apache.org/jira/browse/HADOOP-5615) with a spec file for building a 0.19.1 RPM. I like the idea of Cloudera's RPM file very much. In particular, it has nifty ...
    Ian SoboroffIan Soboroff
    Apr 2, 2009 at 7:47 pm
    Apr 21, 2009 at 10:38 am
  • Hello, Can anyone tell me if there is any way running a map-reduce job from a java program without specifying the jar file by JobConf.setJar() method? Thanks, -- Mohammad Farhan Husain Research ...
    Farhan HusainFarhan Husain
    Apr 1, 2009 at 5:57 pm
    Apr 7, 2009 at 7:11 pm
  • Dear all, I wrote a plugin codes for Hadoop, which calls the interfaces in Cpp-built .so library. The plugin codes are written in java, so I prepared a JNI class to encapsulate the C interfaces. The ...
    Ian jonhsonIan jonhson
    Apr 30, 2009 at 3:19 am
    May 1, 2009 at 2:39 am
  • I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a task, it's only using 2 out of 10, any way to know why it's only using 2? thanks
    Javateck javateckJavateck javateck
    Apr 21, 2009 at 8:20 pm
    Apr 22, 2009 at 12:29 am
  • Hey all I'm trying to connect two separate Hadoop clusters. Is it possible to do so? I need data to be shuttled back and forth between the two clusters. Any suggestions? Thank you! Mithila Nagendra ...
    Mithila NagendraMithila Nagendra
    Apr 7, 2009 at 4:49 am
    Apr 7, 2009 at 8:04 pm
  • I need to use the output of the reduce, but I don't know how to do. use the wordcount program as an example if i want to collect the wordcount into a hashtable for further use, how can i do? the ...
    Andy2005cstAndy2005cst
    Apr 2, 2009 at 9:41 am
    Apr 2, 2009 at 6:59 pm
  • Hi. I've heard that HDFS starts to slow down after it's been running for a long time. And I believe I've experienced this. So, I was thinking to set up a cron job to execute every week to shutdown ...
    Marc LimotteMarc Limotte
    Apr 24, 2009 at 4:31 pm
    Apr 26, 2009 at 6:35 am
  • Hi all! I have a MR job use to import contents into HBase. The content is text file in HDFS. I used the maps file to store local path of contents. Each content has the map file. ( the map is a text ...
    Nguyenhuynh.mrNguyenhuynh.mr
    Apr 22, 2009 at 7:36 am
    Apr 24, 2009 at 6:15 am
  • Hello all: I am new to Hadoop and Map Reduce. I am writing a program to analyze some census data. I have a general question with MapReduce: In the Reducer, how can I separate keys to do separate ...
    RezaReza
    Apr 18, 2009 at 5:21 pm
    Apr 20, 2009 at 3:18 am
  • Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same ...
    Stefan WillStefan Will
    Apr 6, 2009 at 4:45 pm
    Apr 14, 2009 at 11:56 am
  • Hi. I know that there were some hard to find bugs with replication set to 2, which caused data loss to HDFS users. Was there any progress with these issues, and if there any fixes which were ...
    Stas OskinStas Oskin
    Apr 10, 2009 at 4:12 pm
    Apr 10, 2009 at 7:24 pm
  • Hello all, I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set contains a set of "means" and the second a set of objects tied to a ...
    Christian Ulrik SøttrupChristian Ulrik Søttrup
    Apr 4, 2009 at 9:12 pm
    Apr 5, 2009 at 8:06 pm
  • Can someone tell whether a file will occupy one or more blocks? for example, the default block size is 64MB, and if I save a 4k file to HDFS, will the 4K file occupy the whole 64MB block alone? so in ...
    Javateck javateckJavateck javateck
    Apr 3, 2009 at 1:45 am
    May 7, 2009 at 1:11 am
  • Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get ...
    Razen Al HarbiRazen Al Harbi
    Apr 28, 2009 at 9:15 am
    Apr 30, 2009 at 9:15 am
  • Hi all! I have the large String and I want to write it into the file in HDFS. (The large string has 100.000 lines.) Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils. But the ...
    Nguyenhuynh.mrNguyenhuynh.mr
    Apr 29, 2009 at 2:48 am
    Apr 29, 2009 at 10:15 am
  • If I understand correctly - Hadoop forms a general purpose cluster on which you can execute jobs? We have a Java data processing application here that follows the Producer - Consumer pattern. It has ...
    Adam RetterAdam Retter
    Apr 28, 2009 at 10:06 am
    Apr 29, 2009 at 8:37 am
  • Hi there, We're working on an image analysis project. The image processing code is written in Matlab. If I invoke that code from a shell script and then use that shell script within Hadoop streaming, ...
    Sameer TilakSameer Tilak
    Apr 21, 2009 at 5:56 pm
    Apr 23, 2009 at 8:25 pm
  • Suppose a SequenceFile (containing keys and values that are BytesWritable) is used as input. Will it be divided into InputSplits? If so, what's the criteria use for splitting? I'm interested in this ...
    Barnet WagmanBarnet Wagman
    Apr 20, 2009 at 2:24 am
    Apr 23, 2009 at 3:08 pm
  • Hi there, I setup a small cluster for testing. When I start my cluster on my master node, I have to type the password for starting each datanode and tasktracker. That's pretty annoying and may be ...
    Yabo-Arber XuYabo-Arber Xu
    Apr 22, 2009 at 3:56 am
    Apr 23, 2009 at 1:24 am
  • Hi, I would like to implement a Multi-threaded reducer. As per my understanding , the system does not have one coz we expect the output to be sorted. However, in my case I dont need the output ...
    Sagar NaikSagar Naik
    Apr 10, 2009 at 6:13 pm
    Apr 13, 2009 at 3:03 pm
  • I checked out hadoop-core-0.19 export CFLAGS=$CUSTROOT/include export LDFLAGS=$CUSTROOT/lib (they contain lzo which was built with --shared) lzo1a.h lzo1b.h lzo1c.h lzo1f.h lzo1.h lzo1x.h lzo1y.h ...
    Saptarshi GuhaSaptarshi Guha
    Apr 1, 2009 at 6:29 pm
    Apr 6, 2009 at 4:07 am
  • My project of parsing through material for a semantic search engine requires me to use the http://nlp.stanford.edu/software/lex-parser.shtml Stanford NLP parser on hadoop cluster. To use the Stanford ...
    Hari939Hari939
    Apr 18, 2009 at 12:18 pm
    Jul 2, 2009 at 4:41 am
  • hi, If I write a large file to HDFS, will it be split into blocks and multi-blocks are written to HDFS at the same time? Or HDFS can only write block by block? Thanks. -- View this message in ...
    Xie, TaoXie, Tao
    Apr 27, 2009 at 9:22 am
    Apr 30, 2009 at 3:56 am
  • Hi Under <hadoop-tmp-dir /mapred/local there are directories like "attempt_200904262046_0026_m_000002_0" Each of these directories contains files of format: intermediate.1 intermediate.2 ...
    Sandhya ESandhya E
    Apr 28, 2009 at 7:02 am
    Apr 28, 2009 at 10:46 am
  • No, I didn't mark 0.19.1 stable. I left 0.18.3 as our most stable release. My company skipped deploying 0.19.x so I have no experience with that branch. Others? Nige
    Nigel DaleyNigel Daley
    Apr 23, 2009 at 5:31 am
    Apr 23, 2009 at 8:39 pm
  • Hi all! I have some jobs: job1, job2, job3,... . Each job working with the group. To control jobs, I have JobControllers, each JobController control jobs follow the specified group. Example: - Have 2 ...
    Nguyenhuynh.mrNguyenhuynh.mr
    Apr 21, 2009 at 9:02 am
    Apr 22, 2009 at 7:24 am
  • I've written a MR job with multiple outputs. The "normal" output goes to files named part-XXXXX and my secondary output records go to files I've chosen to name "ExceptionDocuments" (and therefore are ...
    Stuart WhiteStuart White
    Apr 20, 2009 at 8:15 pm
    Apr 21, 2009 at 8:55 pm
  • Hi, Its been several days since we have been trying to stabilize hadoop/hbase on ec2 cluster. but failed to do so. We still come across frequent region server fails, scanner timeout exceptions and OS ...
    Rakhi KhatwaniRakhi Khatwani
    Apr 17, 2009 at 4:40 pm
    Apr 21, 2009 at 12:45 pm
  • Hi, I ran a Hadoop MapReduce task in the local mode, reading and writing from HDFS, and it took 2.5 minutes. Essentially the same operations on the local file system without MapReduce took 1/2 ...
    Mark KerznerMark Kerzner
    Apr 20, 2009 at 4:27 am
    Apr 20, 2009 at 3:37 pm
  • Hi, I am running a map-reduce program on 6-Node ec2 cluster. and after a couple of hours all my tasks gets hanged. so i started digging into the logs.... there were no logs for regionserver no logs ...
    Rakhi KhatwaniRakhi Khatwani
    Apr 16, 2009 at 7:45 am
    Apr 16, 2009 at 3:16 pm
  • Hey all, I was trying to copy some data from our cluster on 0.19.2 to a new cluster on 0.18.3 by using disctp and the hftp:// filesystem. Everything seemed to be going fine for a few hours, but then ...
    Bryan DuxburyBryan Duxbury
    Apr 9, 2009 at 6:40 am
    Apr 9, 2009 at 8:56 pm
  • I tried to store protocolbuffer as BytesWritable in a sequence file <Text, BytesWritable . It's stored using SequenceFile.Writer(new Text(key), new BytesWritable(protobuf.convertToBytes())). When ...
    BzhengBzheng
    Apr 9, 2009 at 12:00 am
    Apr 9, 2009 at 3:04 am
  • I am using a cluster of mixed hardware, 32-bit and 64-bit machines, to run Hadoop 0.18.3. I can't use the distribution tar ball since I need to apply a couple of patches. So I build my own Hadoop ...
    Bill AuBill Au
    Apr 6, 2009 at 9:46 pm
    Apr 7, 2009 at 2:18 am
Group Navigation
period‹ prev | Apr 2009 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions199
posts881
users202
websitehadoop.apache.org...
irc#hadoop

202 users for April 2009

Jason hadoop: 57 posts Stas Oskin: 41 posts Aaron Kimball: 38 posts Mithila Nagendra: 26 posts Rasit OZDAS: 26 posts Brian Bockelman: 25 posts Todd Lipcon: 24 posts Alex Loddengaard: 19 posts Steve Loughran: 18 posts Javateck javateck: 17 posts Rakhi Khatwani: 17 posts Farhan Husain: 16 posts Owen O'Malley: 14 posts Puri, Aseem: 13 posts Tim robertson: 13 posts Edward J. Yoon: 12 posts Foss User: 12 posts Jim Twensky: 12 posts Sharad Agarwal: 12 posts Nguyenhuynh.mr: 11 posts
show more