Try this rather small C++ program...it will more than likley be a LOT faster than anything you could do in hadoop. Hadoop is not the hammer for every nail. Too many people think that any "cluster" solution will automagically scale their problem...tain't true.

I'd appreciate hearing your results with this.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
if (argc < 2) {
cerr << "Usage: " << argv[0] << " [filename]" << endl;
return -1;
ifstream in(argv[1]);
if (!in) {
return -1;
string str;
in >> str;
int n=0;
while(!in.eof()) {
//cout << str << endl;
in >> str;
cout << n << " words" << endl;
return 0;

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate

From: Igor Bubkin [igba14@gmail.com]
Sent: Tuesday, February 01, 2011 2:19 AM
To: common-issues@hadoop.apache.org
Cc: common-user@hadoop.apache.org
Subject: EXTERNAL:How to speed up of Map/Reduce job?

Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

Igor Babkin, Mifors.com

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
postedFeb 1, '11 at 4:32p
activeFeb 3, '11 at 10:58a



site design / logo © 2022 Grokbase