Search Discussions

115 discussions - 611 posts

  • Hi, Can someone guide me on how to write program using hadoop framework that analyze the log files and find out the top most frequently occurring keywords. The log file has the format - keyword ...
    Tarandeep SinghTarandeep Singh
    Feb 4, 2008 at 10:04 pm
    Feb 5, 2008 at 8:00 pm
  • We are starting to build larger clusters, and want to better understand how to configure the network topology. Up to now we have just been setting up a private vlan for the small clusters. We have ...
    Jason VennerJason Venner
    Feb 12, 2008 at 7:52 pm
    Feb 12, 2008 at 11:54 pm
  • The link inversion and ranking algorithms for Yahoo Search are now being generated on Hadoop: http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest- production-hadoop.html Some Webmap ...
    Owen O'MalleyOwen O'Malley
    Feb 19, 2008 at 6:01 pm
    Feb 20, 2008 at 6:50 pm
  • Hi Folks, Let's get the word out that Hadoop is being used and is useful in your organizations, ok? Please add yourselves to the Hadoop powered by page, or reply to this email with what details you ...
    Eric BaldeschwielerEric Baldeschwieler
    Feb 21, 2008 at 6:27 am
    Mar 1, 2008 at 3:57 am
  • Is Hadoop cache frequently/LRU/MRU map input files? Or does it upload files from the disk each time a file is needed no matter if it was the same file that was required by the last job on the same ...
    Shimi KShimi K
    Feb 10, 2008 at 2:52 pm
    Feb 11, 2008 at 4:49 pm
  • Hi everyone, I downloaded the nightly build (see below) yesterday and after the cluster worked fine for about 10 hours I got the following error message from the DFS client even all data nodes were ...
    André MartinAndré Martin
    Feb 21, 2008 at 1:29 pm
    Mar 4, 2008 at 1:31 pm
  • Arun - if you can't pull the api - then u must redirect the api to the new call that preserves it's semantics. in this case - had we re-implemented SequenceFile.setCompressionType in 0.15 to call ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Feb 21, 2008 at 9:17 pm
    Feb 21, 2008 at 11:04 pm
  • If I have a write operation that takes a while between opening and closing the file, what is the effect of a node doing that writing crashing in the middle? For example, suppose I have large logs ...
    Steve SapovitsSteve Sapovits
    Feb 26, 2008 at 1:13 am
    Feb 29, 2008 at 10:38 pm
  • Hi, I think you have already heard rumours about Microsoft could buy Yahoo. Does anybody have any idea how this could impact specifically Hadoop future? I know this is all about speculations now... ...
    Lukas VlcekLukas Vlcek
    Feb 1, 2008 at 4:11 pm
    Feb 4, 2008 at 2:19 pm
  • Hi, I don't care about key value in the output file. Is there any way how I can suppress key in the output? Is there a way how to tell (Text)OutputFormat not to write key but value only? Or can I ...
    Lukas VlcekLukas Vlcek
    Feb 19, 2008 at 9:53 pm
    Feb 21, 2008 at 7:29 am
  • What's the best way to get additional configuration arguments to my mappers and reducers? Jeff
    Jeff EastmanJeff Eastman
    Feb 9, 2008 at 11:40 pm
    Feb 11, 2008 at 10:22 pm
  • Hi All First of all since this is my first post I must say congrats for the great piece of software (both Hadoop and HBase). I've been using Hadoop&HBase for a while and I have a question, let me ...
    David AlvesDavid Alves
    Feb 7, 2008 at 5:34 pm
    Feb 25, 2009 at 9:09 am
  • hi, Can I sort the output of reducer based on the value instead of key. Also can I specify that the output should be sorted in decreasing order ? Mapper output - <aWord, 1 Reducer gets- <aWord, ...
    Tarandeep SinghTarandeep Singh
    Feb 21, 2008 at 11:47 pm
    Feb 22, 2008 at 6:48 pm
  • In the Nutch wiki, I was reading this http://wiki.apache.org/hadoop/GettingStartedWithHadoop I have problems understanding this section: == Starting up a larger cluster == Ensure that the Hadoop ...
    Ben KucinichBen Kucinich
    Feb 7, 2008 at 6:52 pm
    Feb 12, 2008 at 5:11 pm
  • Hi, I'm relatively new to Hadoop and I have what I hope is a simple question: I don't understand why the key/value assumption is preserved AFTER the reduce operation, in other words why the output of ...
    Yuri PradkinYuri Pradkin
    Feb 12, 2008 at 8:22 pm
    Feb 12, 2008 at 9:58 pm
  • Hi All: The following write-up is offered to help out anybody else who has seen performance problems and "hangs" while using dfs -copyToLocal/-cat. One of the performance problems that has been ...
    C GC G
    Feb 27, 2008 at 8:06 pm
    Feb 27, 2008 at 11:44 pm
  • Hi, I have an image processing library in C++ and want to run it as a MapReduce job via JNI. While I have some idea about how to include an external JAR into MapReduce, I am not sure how that works ...
    Chang HuChang Hu
    Feb 21, 2008 at 11:08 pm
    Feb 24, 2008 at 8:51 pm
  • There have been several proposals for a Lucene-based distributed index architecture. 1) Doug Cutting's "Index Server Project Proposal" at ...
    Ning LiNing Li
    Feb 6, 2008 at 7:00 pm
    Feb 8, 2008 at 3:31 am
  • I see the class is full with more than 50 watchers. Any chance the size will expand? If not, any date in mind for a second one?
    Feb 25, 2008 at 5:30 pm
    Mar 5, 2008 at 10:09 pm
  • Currently, we have the following setup: --cluster A, running Nutch: small RAM per node --cluster B, just running Hadoop: lots of RAM per node At some point in the future we will want cluster B to ...
    Miles OsborneMiles Osborne
    Feb 28, 2008 at 10:44 am
    Feb 28, 2008 at 10:56 pm
  • I'm processing a number of .gz compressed Apache and other logs using Hadoop 0.15.2 and encountering fatal decompression errors such as: 08/02/26 12:09:12 INFO mapred.JobClient: Task Id : ...
    Jeff EastmanJeff Eastman
    Feb 26, 2008 at 8:59 pm
    Feb 27, 2008 at 8:31 pm
  • Hi, I am able to get Hadoop running and also able to compile the libhdfs. But when I run the hdfs_test program it is giving Segmentation Fault. Just a small program like this #include "hdfs.h" int ...
    Raghavendra KRaghavendra K
    Feb 21, 2008 at 11:30 am
    Feb 26, 2008 at 10:56 am
  • Are there any existing HDFS access packages out there for Python? I've had some success using SWIG and the C HDFS code, as documented here: http://www.stat.purdue.edu/~sguha/code.html (halfway down ...
    Steve SapovitsSteve Sapovits
    Feb 21, 2008 at 9:23 pm
    Feb 22, 2008 at 12:22 am
  • Hello all How to configure Hadoop or Eclipse for if my Hadoop ssh port not 22? -- David
    Feb 13, 2008 at 6:50 am
    Feb 15, 2008 at 11:08 pm
  • Chris Kline reported a problem in early January where a file which had too few replicated blocks did not get replicated until a DFS restart. I just saw a similar issue. I had a file that had a block ...
    Ted DunningTed Dunning
    Feb 8, 2008 at 1:06 am
    Feb 9, 2008 at 1:04 am
  • Sorry for the cross-post to hadoop and hbase. Is the hbase-user group active yet? I haven't got any e-mails from it. I am having a problem with hbase mapreduce I get the following exception my map ...
    Marc HarrisMarc Harris
    Feb 4, 2008 at 5:10 pm
    Feb 4, 2008 at 7:42 pm
  • When running in Pseudo Distributed mode as outlined in the Quickstart, I see that the DFS is, at some level, identified by the IP address it was created under. I''m doing this on a laptop and when I ...
    Steve SapovitsSteve Sapovits
    Feb 27, 2008 at 1:15 pm
    Feb 27, 2008 at 8:05 pm
  • Hi, I'm currently looking into how to better scale the performance of our calculations involving large sets of financial data. It is currently using a series of Oracle SQL statements to perform the ...
    Chuck LanChuck Lan
    Feb 22, 2008 at 5:14 pm
    Feb 26, 2008 at 1:49 am
  • Hi All: The documentation for the configuration parameters mapred.map.tasks and mapred.reduce.tasks discuss these values in terms of “number of available hosts” in the grid. This description strikes ...
    C GC G
    Feb 20, 2008 at 5:31 pm
    Feb 22, 2008 at 5:26 pm
  • Hello, My first time posting this in the news group. My question sounds more like a MapReduce question instead of Hadoop HDFS itself. To my understanding, the JobClient will submit all Mapper and ...
    Feb 15, 2008 at 9:26 pm
    Feb 20, 2008 at 1:22 am
  • What is the best way to kill a bad job (e.g. an infinite loop)? The job I was running went into an infinite loop and I had to stop it with ctrl-c on the master node. Then I used bin/stop-all.sh ...
    Jim the Standing BearJim the Standing Bear
    Feb 13, 2008 at 6:33 am
    Feb 13, 2008 at 5:52 pm
  • I have a Hadoop running on a master node fs.default.name is and mapred.job.tracker is I am accessing it's web pages on port 50030 from another ...
    Ben KucinichBen Kucinich
    Feb 8, 2008 at 4:29 pm
    Feb 11, 2008 at 7:01 pm
  • The reduce only does a merge of sorted segments. The segments have to be sorted using all the sort fields before the merge itself. Otherwise u can't do a merge. (hope I understood the question right) ...
    Joydeep Sen SarmaJoydeep Sen Sarma
    Feb 6, 2008 at 7:58 pm
    Feb 8, 2008 at 12:39 am
  • Sorry about the word-wrapping (original email) - Yahoo Mail problem :( Is anyone going to be capturing the Piglet meeting on video for the those of us living in other corners of the planet? Thank ...
    Otis GospodneticOtis Gospodnetic
    Feb 6, 2008 at 7:40 pm
    Feb 7, 2008 at 9:48 pm
  • I have been using Hadoop for a couple of months now, and I recently moved to an x86_64 platform. When I ran some jobs that I've run previously on the 32-bit cluster, I got OutOfMemoryError on a large ...
    Travis WoodruffTravis Woodruff
    Feb 5, 2008 at 12:42 am
    Feb 5, 2008 at 7:40 pm
  • Hi, We're having problems when trying to deal with the namenode failover, by following the wiki http://wiki.apache.org/hadoop/NameNodeFailover If we point dfs.name.dir to 2 local directories, it ...
    Nathan WangNathan Wang
    Feb 23, 2008 at 1:26 am
    May 30, 2008 at 5:46 pm
  • Hello everyone, I have a problem with directory hdfs:///mapredsystem and I'm not sure if it's a bug or my fault. Not sure if this influences what follows, but I have two users, one is "hadoop" who ...
    Luca TelloliLuca Telloli
    Feb 26, 2008 at 5:12 pm
    Feb 27, 2008 at 10:52 am
  • Hello everyone, I've been trying to run HOD on a sample cluster with three nodes that already have Torque installed and (hopefully?) properly working. I also prepared a configuration file for hod, ...
    Feb 22, 2008 at 4:50 am
    Feb 25, 2008 at 3:51 pm
  • hi brothers, i am completely confused about the hadoop usage / deployment etc. yes, i did read the documentation & other details on apach foundation site yet i am a dumb a** n perhaps hence am still ...
    Sher KhanSher Khan
    Feb 23, 2008 at 10:38 am
    Feb 25, 2008 at 5:52 am
  • Hi all, I have a program need to use two reduce fucntions, who can tell me why? Thank you! Qiang
    Ma qiangMa qiang
    Feb 23, 2008 at 7:34 am
    Feb 25, 2008 at 4:32 am
  • Hi all: Here I have two mapreduce program.I need to use the result of the first mapreduce program to computer another values which generate in the second mapreduce program and this intermediate ...
    Ma qiangMa qiang
    Feb 21, 2008 at 6:21 am
    Feb 21, 2008 at 7:54 pm
  • Hi, Are there any known issues on how dfsadmin reports disk usage? I'm getting some weird values: Name: State : In Service Total raw bytes: 1433244008448 (1.3 TB) Remaining raw ...
    Martin TraversoMartin Traverso
    Feb 15, 2008 at 9:06 pm
    Feb 16, 2008 at 2:20 am
  • Hi, Is it possible to have Reducer output the data into two different formats at the same time? For example one output in SequenceFileOutputFormat for further processing by consequential M/R job and ...
    Lukas VlcekLukas Vlcek
    Feb 14, 2008 at 10:05 pm
    Feb 15, 2008 at 6:55 pm
  • Hey all, I'm just starting with both Hadoop and HBase. I've created a 3-node cluster - 1 master and 2 slaves. I've had some fun in the shell, where everything works as expected, and look forward to ...
    Cass CostelloCass Costello
    Feb 3, 2008 at 8:50 am
    Feb 5, 2008 at 9:04 am
  • Are the client jars available in a place like ibiblio or repo1.maven.org so that they can easily be used with maven builds? It's kind of a pain having to deploy them into my local repository every ...
    Marc HarrisMarc Harris
    Feb 1, 2008 at 3:53 pm
    Feb 4, 2008 at 8:05 pm
  • Hi all: I have a mapreduce program now, and in my map function I need use some parameters which read from another table in HBase using HTable class , as a result I find the this program run so slow. ...
    Ma qiangMa qiang
    Feb 1, 2008 at 10:36 am
    Feb 2, 2008 at 12:13 am
  • Hello, I'm experimenting with hadoop a few days now, but i'm stuck trying to output different classes from map and reduce methods. I have something like: class test { public static class Map extends ...
    Feb 28, 2008 at 2:09 am
    Mar 5, 2008 at 3:29 pm
  • Hi everyone, I'm seeing the above exception on my DFS clients: Any idea why this exception is thrown? Thx in advance. Cu on the 'net, Bye - bye, <<<<< André <<<< èrbnA
    André MartinAndré Martin
    Feb 29, 2008 at 10:43 pm
    Mar 1, 2008 at 6:35 pm
  • I have found that HOD writes a series of log files to directories on the virtual cluster master, if you specify log directories. The interesting part is figuring out which machine was the virtual ...
    Jason VennerJason Venner
    Feb 28, 2008 at 3:07 am
    Feb 29, 2008 at 5:08 am
  • I am interested in examining a MapReduce execution in order to determine the amount of time it takes to execute each of the following parts of a MapReduce job: - Loading of data onto mappers - ...
    Alexander MontAlexander Mont
    Feb 26, 2008 at 4:25 am
    Feb 26, 2008 at 5:31 am
Group Navigation
period‹ prev | Feb 2008 | next ›
Group Overview
groupcommon-user @

143 users for February 2008

Ted Dunning: 80 posts Jason Venner: 35 posts Joydeep Sen Sarma: 22 posts Steve Sapovits: 20 posts Owen O'Malley: 19 posts Raghu Angadi: 17 posts Amar Kamat: 15 posts Arun C Murthy: 14 posts Jeff Eastman: 13 posts Doug Cutting: 12 posts Edward yoon: 12 posts Miles Osborne: 12 posts Tarandeep Singh: 11 posts Lukas Vlcek: 10 posts Tim Wintle: 10 posts C G: 9 posts Ma qiang: 9 posts Marc Harris: 9 posts Peter W.: 9 posts Allen Wittenauer: 8 posts
show more