Try this rather small C++ program...it will more than likley be a LOT faster than anything you could do in hadoop. Hadoop is not the hammer for every nail. Too many people think that any "cluster" solution will automagically scale their problem...tain't true.
I'd appreciate hearing your results with this.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
if (argc < 2) {
cerr << "Usage: " << argv[0] << " [filename]" << endl;
return -1;
}
ifstream in(argv[1]);
if (!in) {
perror(argv[1]);
return -1;
}
string str;
in >> str;
int n=0;
while(!in.eof()) {
++n;
//cout << str << endl;
in >> str;
}
in.close();
cout << n << " words" << endl;
return 0;
}
Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate
________________________________________
From: Igor Bubkin [[email protected]]
Sent: Tuesday, February 01, 2011 2:19 AM
To: [email protected]
Cc: [email protected]
Subject: EXTERNAL:How to speed up of Map/Reduce job?
Hello everybody
I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?
BR,
Igor Babkin, Mifors.com