FAQ

Search Discussions

166 discussions - 558 posts

  • We have some very large files that we access via memory mapping in Java. Someone's asked us about how to make this conveniently deployable in Hadoop. If we tell them to put the files into hdfs, can ...
    Benson MarguliesBenson Margulies
    Apr 11, 2011 at 10:57 pm
    Apr 14, 2011 at 2:42 am
  • Hi Everyone, I have a Cluster with one Master(JobTracker and NameNode - Intel Core2Duo 2 GB Ram) and four Slaves(Datanode and Tasktracker - Celeron 2 GB Ram). My Inputdata are between 2GB-10GB and I ...
    Baran cakiciBaran cakici
    Apr 28, 2011 at 3:22 pm
    May 2, 2011 at 4:15 pm
  • Hi All I have created a map reduce job and to run on it on the cluster, i have bundled all jars(hadoop, hbase etc) into single jar which increases the size of overall file. During the development ...
    Shuja RehmanShuja Rehman
    Apr 4, 2011 at 3:06 pm
    Apr 7, 2011 at 7:27 am
  • I have a Hadoop reducer that needs to write then read (key, value) pairs from a local temporary file. It seems like the way to do this is with Sequence Files, letting the Hadoop API choose their ...
    W.P. McNeillW.P. McNeill
    Apr 4, 2011 at 9:24 pm
    May 6, 2011 at 6:14 pm
  • Hello, One of our nodes has a bad hard disk which needs to be replaced. I'm planning on doing the following: 1) Decommission the node 2) Replace the disk 3) Bring the node back into the cluster Is ...
    Mayuran YogarajahMayuran Yogarajah
    Apr 25, 2011 at 11:38 pm
    Apr 26, 2011 at 5:42 pm
  • Hi, As a part of my final year BE final project I want to estimate the time required by a M/R job given an application and a base file system. Can you folks please help me by posting some thoughts on ...
    Real great..Real great..
    Apr 16, 2011 at 10:02 am
    Apr 18, 2011 at 8:50 am
  • One of assumptions map reduce made, I think, is that size of map's output is smaller than input. Although we can see many applications have the same size of output with input, like, sort, merge,etc. ...
    Elton skyElton sky
    Apr 29, 2011 at 12:03 pm
    May 21, 2011 at 4:58 am
  • Hello everyone, I am new to hadoop... I set up a hadoop cluster of 4 ubuntu systems. ( Hadoop 0.20.2) and I am running the well known word count (gutenberg) example to test how fast my hadoop is ...
    Praveenesh kumarPraveenesh kumar
    Apr 19, 2011 at 1:58 pm
    Apr 19, 2011 at 2:40 pm
  • Hey, I have been developing Map/Red jars for a while now, and I am still not comfortable with the developing environment I gathered for myself (and the team) I am curious how other Hadoop developers ...
    Guy DoulbergGuy Doulberg
    Apr 7, 2011 at 7:41 am
    Apr 8, 2011 at 9:40 pm
  • Hi guys I'm having a problem: I'm reading a file where fields are terminated by space (' ', ascii 32) into a table. I'm not making these files so I can't easily change this use of ' ' as field ...
    Bjørn RemsethBjørn Remseth
    Apr 4, 2011 at 9:51 am
    Apr 5, 2011 at 5:23 am
  • Hi, I need a help very bad. I got an HDFS permission error by starting to run hadoop job org.apache.hadoop.security.AccessControlException: Permission denied: user=wp, access=WRITE, ...
    Peng, WeiPeng, Wei
    Apr 24, 2011 at 6:41 am
    Apr 25, 2011 at 5:09 pm
  • I have a requirement where I have large sets of incoming data into a system I own. A single unit of data in this set has a set of immutable attributes + state attached to it. The state is dynamic and ...
    Sam SeigalSam Seigal
    Apr 14, 2011 at 1:13 am
    Apr 15, 2011 at 3:58 am
  • Hey all, I'm trying to format my NameNode (I've done it successfully in the past), but I'm getting a strange error: 11/04/12 16:47:32 INFO common.Storage: java.io.IOException: Input/output error at ...
    Jeffrey WangJeffrey Wang
    Apr 12, 2011 at 11:55 pm
    Apr 14, 2011 at 3:03 am
  • Dear Folks, i'm having a custom implementation of InputSplit which contains a combination of multiple blocks (similar to CombineFileInputFormat). Each splits can have a different "data-locality ...
    Johannes ZillmannJohannes Zillmann
    Apr 20, 2011 at 10:08 am
    May 3, 2011 at 3:57 pm
  • Hi guys, I wanted to know exactly which was the latest stable release of Hadoop. In the site it says it's release 0.20.2, but 0.21.0 is also available and in the repository there's already a branch ...
    Juan P.Juan P.
    Apr 29, 2011 at 3:31 am
    Apr 30, 2011 at 2:21 am
  • Hi All, This is question regarding "HDFS checksum" computation. I understood that When we read a file from HDFS by default it verifies the checksum and your read would not succeed if the file is ...
    ThamizhThamizh
    Apr 8, 2011 at 12:18 pm
    Apr 12, 2011 at 3:53 pm
  • yes,my key is ip,and value is a object(which inherited hadoop Record class,and will be converted a visualized data),e.g.: key field1,field2,field3(these are properties belong to object) 12.121.23.121 ...
    LeibnitzLeibnitz
    Apr 11, 2011 at 6:33 am
    Apr 12, 2011 at 2:37 am
  • Hi all, I have some architectural question. For my app I have persistent 50 GB data, which stored in HDFS, data is simple CSV format file. Also for my app which should be run over this (50 GB) data I ...
    OleksiyOleksiy
    Apr 10, 2011 at 9:10 pm
    Apr 11, 2011 at 3:43 pm
  • Hey guys, We are trying to figure out why many of our Map/Reduce job on the cluster are failing. In log we are getting this message I n the failing jobs: org.apache.hadoop.ipc.RemoteException: ...
    Guy DoulbergGuy Doulberg
    Apr 5, 2011 at 6:54 am
    Apr 6, 2011 at 8:01 am
  • Dear all, I have a running 4-node Hadoop cluster and some data stored in HDFS. Today by mistake,I start the hadoop cluster with root user. root bin/start-all.sh After correcting my mistake , when I ...
    Adarsh SharmaAdarsh Sharma
    Apr 28, 2011 at 5:32 am
    Apr 28, 2011 at 9:00 am
  • Hi, I had asked a question about predicting map times in hadoop. Thanks a lot for the encouraging response. I want to know if anybody has a code or any idea on how to calculate the execution time? I ...
    Real great..Real great..
    Apr 20, 2011 at 9:29 am
    Apr 27, 2011 at 3:00 pm
  • Hi, I heard from so many people saying we should using JBOD instead of RAID, that is we should format each local disk(used for data storage) into an individual file system and define the mount point ...
    Xiaobo GuXiaobo Gu
    Apr 25, 2011 at 3:04 pm
    Apr 26, 2011 at 3:00 pm
  • Hi all, I am trying to perform matrix-vector multiplication using Hadoop. So I have matrix M in a file, and vector v in another file. Obviously, files are of different sizes. Is it possible to make ...
    AanghelescuAanghelescu
    Apr 22, 2011 at 9:34 pm
    Apr 25, 2011 at 2:09 pm
  • Hello people, I am beginner.. I already set up hadoop cluster of around 4 nodes. Now I am looking forward to make map-reduce programs on them. I am using eclipse plugin to connect to hadoop master ...
    Praveenesh kumarPraveenesh kumar
    Apr 20, 2011 at 5:31 am
    Apr 21, 2011 at 5:09 pm
  • Hi, I'm using Cloudera's distribution with the pseudo config. I'm also using a system-wide install of RVM, which manages Ruby and Gems. My mapper is a Ruby script like this #!/bin/env ruby ... The ...
    Guang-Nan ChengGuang-Nan Cheng
    Apr 2, 2011 at 4:48 am
    Apr 16, 2011 at 4:24 am
  • Guys, How do we enable following parameters: # Logfile size and and 30-day backups log4j.appender.RFA.MaxFileSize=1MB log4j.appender.RFA.MaxBackupIndex=30 -- Thanks, Shah
    Shahnawaz SaifiShahnawaz Saifi
    Apr 20, 2011 at 11:46 am
    May 17, 2011 at 10:01 pm
  • Hi All, I am trying to copy files from one hadoop cluster to another hadoop cluster but I am getting following error: [phx1-rb-bi-dev50-metrics-qry1:]$ scripts/hadoop.sh distcp ...
    Sonia gehlotSonia gehlot
    Apr 19, 2011 at 2:05 pm
    May 2, 2011 at 8:41 pm
  • Hi, People say a balanced server configration is as following: 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks But we have been used to use storages servers with 24 1T SATA Disks, we are wondering will ...
    Xiaobo GuXiaobo Gu
    Apr 26, 2011 at 1:55 pm
    Apr 27, 2011 at 1:58 pm
  • I'm getting many "cannot find symbol" errors. I've been searching everywhere and have given up. There has to be a good (and very simple) reason for why this is happening. My setup is as follows: ...
    ModemideModemide
    Apr 18, 2011 at 6:35 pm
    Apr 20, 2011 at 8:17 pm
  • I need that during the execution of a particular job, a maximum of one map task execute on each cluster node. I've tried setting mapred.tasktracker.map.tasks.maximum=1 on job configuration but seems ...
    Massimo SchiavonMassimo Schiavon
    Apr 15, 2011 at 3:04 pm
    Apr 15, 2011 at 5:00 pm
  • Hi All, I was trying to run the program using HOD on a cluster, when I allocate using 5 nodes, it runs fine, but when I allocate using 6 nodes, everytime I tried to run a program, I get this error: ...
    Boyu ZhangBoyu Zhang
    Apr 11, 2011 at 11:49 pm
    Apr 13, 2011 at 5:52 pm
  • I have a 0.20.2 cluster. I notice that our nodes with 2 TB disks waste tons of disk io doing a 'du -sk' of each data directory. Instead of 'du -sk' why not just do this with java.io.file? How is this ...
    Edward CaprioloEdward Capriolo
    Apr 8, 2011 at 4:16 am
    Apr 8, 2011 at 6:51 pm
  • Dear all, I am following the below links to configure Eclipse with hadoop Environment But don't able to find the Map-Reduce Perspective in Open Perspective Other Option. ...
    Adarsh SharmaAdarsh Sharma
    Apr 8, 2011 at 4:15 am
    Apr 8, 2011 at 1:34 pm
  • I've been working from the 2nd Edition of Tom White's *Hadoop: The Definitive Guide*, but that's still old API (0.20). Are there any books in print that use the new API? Separating old-API vs. ...
    W.P. McNeillW.P. McNeill
    Apr 6, 2011 at 8:32 pm
    Apr 6, 2011 at 9:24 pm
  • How can I tell my job to include all the subdirectories and their content of a certain path? My directory structure is as follows: logs/{YEAR}/{MONTH}/{DAY} and I tried setting my input path to ...
    MarkMark
    Apr 6, 2011 at 2:54 pm
    Apr 6, 2011 at 5:08 pm
  • I am trying Rumen to process Hadoop logs. However, it always gives me errors. 1) name of job log file incorrect In my Hadoop installation, name of job log file looks like "job_<time _<index ...
    Zhenhua GuoZhenhua Guo
    Apr 1, 2011 at 8:50 pm
    Oct 11, 2011 at 4:06 pm
  • Hi, I guess I am not the first one to see the following exception when trying to initialize a LineRecordReader. However, so far I could't figure out a workaround for this problem. I saw that this ...
    Claus StadlerClaus Stadler
    Apr 21, 2011 at 1:58 am
    Jun 2, 2011 at 11:21 pm
  • I got 2 questions: 1. I am wondering how hadoop MR performs when it runs compute intensive applications, e.g. Monte carlo method compute PI. There's a example in 0.21, QuasiMonteCarlo, but that ...
    Elton skyElton sky
    Apr 30, 2011 at 7:19 am
    May 2, 2011 at 6:29 pm
  • I don't know why I can't see my emails immediately sent to the group ... anyways, I'm sorting a sequenceFile using it's sorter on my local filesystem. The inputFile size is 1937690478 bytes. but ...
    Mark questionMark question
    Apr 28, 2011 at 8:46 pm
    May 2, 2011 at 4:56 pm
  • Hi, if you search for "Hadoop" on Dice, you get under 400 hits. We all have no doubt that the demand is greater. Does it mean that companies prefer to train their personnel rather than search for ...
    Mark KerznerMark Kerzner
    Apr 29, 2011 at 2:23 am
    Apr 29, 2011 at 2:10 pm
  • Hello, I'm having some problems setting up my datanodes, I have a 4 node cluster (all of them are datanodes), if I run sudo -u hdfs hadoop dfsadmin -report Configured Capacity: 112231907328 (104.52 ...
    Fabio SoutoFabio Souto
    Apr 28, 2011 at 4:45 pm
    Apr 28, 2011 at 5:56 pm
  • Hello all, Actually, I realised about this problem when trying to use Mahout, trying to create vectors using the $MAHOUT_HOME/bin/mahout seqdirectory in my case: $MAHOUT_HOME/bin/mahout seqdirectory ...
    Liliana Mamani SanchezLiliana Mamani Sanchez
    Apr 27, 2011 at 1:25 pm
    Apr 27, 2011 at 2:53 pm
  • Hi, In some scenarios you have gzipped files as input for your map reduce job (apache logfiles is a common example). Now some of those files are several hundred megabytes and as such will be split by ...
    Niels BasjesNiels Basjes
    Apr 27, 2011 at 7:56 am
    Apr 27, 2011 at 11:10 am
  • I want to create a sequence file on my local harddrive. I want to write something like this: LocalFileSystem fs = new LocalFileSystem(); Configuration configuration = new Configuration(); ...
    W.P. McNeillW.P. McNeill
    Apr 23, 2011 at 1:09 am
    Apr 25, 2011 at 6:06 pm
  • Dear all, I followed a link of a blog to configure Eclipse for running Map-reduce jobs. http://www.harshj.com/2010/07/18/making-the-eclipse-plugin-work-for-hadoop/ I am facing the same issue ...
    Adarsh SharmaAdarsh Sharma
    Apr 21, 2011 at 11:42 am
    Apr 21, 2011 at 12:56 pm
  • I know in the logs you can see 'cmd=open' and the filename. Is there a way to see the closing of the file? Basically, I want to account for the total number of megabytes transferred in hdfs. What is ...
    RitaRita
    Apr 20, 2011 at 12:48 am
    Apr 21, 2011 at 10:13 am
  • Hi, all I'm confused by a question that "how does the HDFS decide where to put the data blocks " I mean that the user invokes some commands like "./hadoop put ***", we assume that this file consistes ...
    Nan ZhuNan Zhu
    Apr 18, 2011 at 1:46 pm
    Apr 19, 2011 at 7:26 pm
  • Looking at workloads like TeraSort where intermediate map output is proportional to HDFS block size, I was wondering whether it would be beneficial to have a mechanism for setting buffer spaces like ...
    Shrinivas JoshiShrinivas Joshi
    Apr 12, 2011 at 7:26 pm
    Apr 16, 2011 at 4:17 pm
  • BUILD FAILED ......./branch-0 .20-append/build.xml:927: The following error occurred while executing this line: ....../branch-0 .20-append/build.xml:933: exec returned: 1 Total time: 1 minute 17 ...
    Alex LuyaAlex Luya
    Apr 12, 2011 at 2:46 am
    Apr 16, 2011 at 2:37 am
  • I understand that part of the rules of MapReduce is that there's no shared global information; nevertheless I have a problem that requires shared global information and I'm trying to get a sense of ...
    W.P. McNeillW.P. McNeill
    Apr 11, 2011 at 4:31 pm
    Apr 15, 2011 at 9:44 pm
Group Navigation
period‹ prev | Apr 2011 | next ›
Group Overview
groupcommon-user @
categorieshadoop
discussions166
posts558
users186
websitehadoop.apache.org...
irc#hadoop

186 users for April 2011

Harsh J: 46 posts Ted Dunning: 22 posts Praveenesh kumar: 18 posts Adarsh Sharma: 15 posts W.P. McNeill: 15 posts Real great..: 12 posts Xiaobo Gu: 12 posts Bikash sharma: 10 posts Mark Kerzner: 10 posts Mark question: 10 posts Guy Doulberg: 9 posts Hadoopman: 9 posts Harsh J: 9 posts James Seigel Tynt: 9 posts Steve Loughran: 9 posts Jason Rutherglen: 8 posts Baran cakici: 7 posts Mehmet Tepedelenlioglu: 7 posts Shuja Rehman: 7 posts Shrinivas Joshi: 6 posts
show more