FAQ
Hello,
I'd like to announce the release of the 0.1 version of RHIPE -R and
Hadoop Integrated Processing Environment. Using RHIPE, it is possible
to write map-reduce algorithms using the R language and start them
from within R.
RHIPE is built on Hadoop and so benefits from Hadoop's fault
tolerance, distributed file system and job scheduling features.
For the R user, there is rhlapply which runs an lapply across the cluster.
For the Hadoop user, there is rhmr which runs a general map-reduce program.

The tired example of counting words:

m <- function(key,val){
words <- substr(val," +")[[1]]
wc <- table(words)
cln <- names(wc)
return(sapply(1:length(wc),function(r)
list(key=cln[r],value=wc[[r]]),simplify=F))
}
r <- function(key,value){
value <- do.call("rbind",value)
return(list(list(key=key,value=sum(value))))
}
rhmr(mapper=m,reduce=r,input.folder="X",output.folder="Y")

URL: http://ml.stat.purdue.edu/rhipe

There are some downsides to RHIPE which are described at
http://ml.stat.purdue.edu/rhipe/install.html#sec-5

Regards
Saptarshi Guha

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 27, '09 at 2:15p
activeApr 27, '09 at 2:15p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Saptarshi Guha: 1 post

People

Translate

site design / logo © 2022 Grokbase