On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani wrote:

(I am using a LINUX machine)

Jeff,

In creating reproducible results, I 'partially' answered my question. I have

attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both

files into your chosen directory, then run 'Rscript mvMultiply.r' in that

directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and

'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to

verify that the two methods produce the same output vector.)

Below are the results that I get, along with discussion (tR and tCall are in

sec):

INCLUDE_DATAPREP,ROWMAJOR,tR,tCall

F,F,13.536,13.875

F,T,13.824,14.299

T,F,13.688,18.167

T,T,13.982,30.730

Interpretation: The execution time for the .Call line is nearly identical to

the call to R operator '%*%'. Two data preparation lines for the .Call

method create the overhead:

A <- t(A) (~12sec, or 12usec per call)

dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)

While the first line can be avoided by providing options in c++ function (as

is done in the BLAS API), I wonder if the second line can be avoided, aside

from the obvious option of rewriting the R scripts to use vectors instead of

matrices. But this defies one of the primary advantages of using R, which is

succinct, high-level coding. In particular, if one has several matrices as

input into a .Call function, then the overhead from matrix-to-vector

transformations can add up. To summarize, my questions are:

1- Do the above results seem reasonable to you? Is there a similar penalty

in R's '%*%' operator for transforming matrices to vectors before calling

BLAS functions?

2- Are there techniques for reducing the above overhead for developers

looking to augment their R code with compiled code?

Regards,

Alireza

---------------------------------------

# mvMultiply.r

# comparing performance of matrix multiplication in R (using '%*%' operator)

vs. calling compiled code (using .Call function)

# y [m x 1] = A [m x n] %*% x [n x 1]

rm(list = ls())

system("R CMD SHLIB mvMultiply.cc")

dyn.load("mvMultiply.so")

INCLUDE_DATAPREP <- F

ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or

column-major fashion

m <- 100

n <- 10

N <- 1000000

diffVec <- array(0, dim = N)

tR <- 0.0

tCall <- 0.0

for (i in 1:N) {

? ? ? ?A <- runif(m*n); dim(A) <- c(m,n)

? ? ? ?x <- runif(n)

? ? ? ?t1 <- proc.time()[3]

? ? ? ?y1 <- A %*% x

? ? ? ?tR <- tR + proc.time()[3] - t1

? ? ? ?if (INCLUDE_DATAPREP) {

? ? ? ? ? ? ? ?t1 <- proc.time()[3]

? ? ? ?}

? ? ? ?if (ROWMAJOR) { #since R will convert matrix to vector in a column-major

fashion, if the c++ function expects a row-major format, we need to

transpose A before converting it to a vector

? ? ? ? ? ? ? ?A <- t(A)

? ? ? ?}

? ? ? ?dim(A) <- dim(A)[1] * dim(A)[2]

? ? ? ?if (!INCLUDE_DATAPREP) {

? ? ? ? ? ? ? ?t1 <- proc.time()[3]

? ? ? ?}

? ? ? ?y2 <- .Call("matvecMultiply", as.double(A), as.double(x),

as.logical(c(ROWMAJOR)))

? ? ? ?tCall <- tCall + proc.time()[3] - t1

? ? ? ?diffVec[i] <- max(abs(y2 - y1))

}

cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")

cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")

cat("Time - Using '%*%' operator in R: ", tR, "sec\n")

cat("Time - Using '.Call' function: ", tCall, "sec\n")

cat("Maximum difference between methods: ", max(diffVec), "\n")

dyn.unload("mvMultiply.so")

---------------------------------------

# mvMultiply.cc

#include <Rinternals.h>

#include <R.h>

extern "C" {

SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {

? ? ? ?double *rA = REAL(A), *rx = REAL(x), *ry;

? ? ? ?int *rrm = LOGICAL(rowmajor);

? ? ? ?int n = length(x);

? ? ? ?int m = length(A) / n;

? ? ? ?SEXP y;

? ? ? ?PROTECT(y = allocVector(REALSXP, m));

? ? ? ?ry = REAL(y);

? ? ? ?for (int i = 0; i < m; i++) {

? ? ? ? ? ? ? ?ry[i] = 0.0;

? ? ? ? ? ? ? ?for (int j = 0; j < n; j++) {

? ? ? ? ? ? ? ? ? ? ? ?if (rrm[0] == 1) {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[i * n + j] * rx[j];

? ? ? ? ? ? ? ? ? ? ? ?} else {

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[j * m + i] * rx[j];

? ? ? ? ? ? ? ? ? ? ? ?}

? ? ? ? ? ? ? ?}

? ? ? ?}

? ? ? ?UNPROTECT(1);

? ? ? ?return(y);

}

}

I realize that you are just beginning to use the .C and .Call

interfaces and your example is therefore a simple one. However, if

you plan to continue with such development it is worthwhile learning

of some of the tools available. I think one of the most important is

the "inline" package that can take a C or C++ code segment as a text

string and go through all the steps of creating and loading a

.Call'able compiled function.

Second, if you are going to use C++ (the code you show could be C code

as it doesn't use any C++ extensions) then you should look at the Rcpp

package written by Dirk Eddelbuettel and Romain Francois which allows

for comparatively painless interfacing of R objects and C++ objects.

The Rcpp-devel list, which I have copied on this reply, is for

questions related to that system. The inline package allows for

various "plugin" constructions to wrap your code in the appropriate

headers and point the compiler to the locations of header files and

libraries. There are two extensions to Rcpp for numerical linear

algebra in C++, RcppArmadillo and RcppEigen. I show the use of

RcppEigen here.

Third there are several packages in R that do the busy work of

benchmarking expressions and neatly formulating the results. I use

the rbenchmark package.

Putting all these together yields the enclosed script and results.

In Eigen, a MatrixXd object is the equivalent of R's numeric matrix

(similarly MatrixXi for integer and MatrixXcd for complex) and a

VectorXd object is the equivalent of a numeric vector. A "mapped"

matrix or vector is one that uses the storage allocated by R, thereby

avoiding a copy operation (similar to your accessing elements of the

arrays through the pointer returned by REAL()). To adhere to R's

functional programming semantics it is a good idea to declare such

objects as const. The 'as' and 'wrap' functions are provided by Rcpp

with extensions in RcppEigen to the Eigen classes in the development

version. In the released versions of Rcpp and RcppEigen we use

intermediate Rcpp objects. These functions have the advantage of

checking the types of R objects being passed. The Eigen code for

matrix multiplication will check the consistency of dimensions in the

operation.

Given the size of the matrix you are working with it is not surprising

that interpretation overhead and checking will be a large part of the

elapsed time, hence the relative differences between different methods

of doing the numerical calculation will be small. The operation of

multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000

floating point operations. Furthermore, each element of the matrix is

used only once so sophisticated methods of manipulating cache contents

won't buy you much. These benchmark results are from a system that

uses Atlas BLAS (basic linear algebra subroutines); other systems will

provide different results. Interestingly, I found on some systems

using R's BLAS, which are not accelerated, the R code is closer in

speed to the code using Eigen. An example is given in the second

version of the output.

-------------- next part --------------

R Under development (unstable) (2011-07-19 r56429)

Copyright (C) 2011 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

library(rbenchmark)

library(inline)

library(RcppEigen)

Loading required package: Rcpp

## create an R function from Eigen-based C++ code

ff <- if (packageVersion("RcppEigen") > "0.1.1") { # development version

+ cxxfunction(signature(Xs = "matrix", ys = "numeric"), '

+ typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;

+ typedef Eigen::Map<Eigen::VectorXd> MVectorXd;

+

+ const MMatrixXd Xe(as<MMatrixXd>(Xs));

+ const MVectorXd ye(as<MVectorXd>(ys));

+ return wrap(Xe * ye);

+ ', plugin = "RcppEigen")

+ } else {

+ cxxfunction(signature(Xs = "matrix", ys = "numeric"), '

+ typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;

+ typedef Eigen::Map<Eigen::VectorXd> MVectorXd;

+

+ const NumericMatrix X(Xs);

+ const NumericVector y(ys);

+ const MMatrixXd Xe(X.rows(), X.cols(), X.begin());

+ const MVectorXd ye(y.size(), y.begin());

+ Eigen::VectorXd res = Xe * ye;

+ return NumericVector(res.data(), res.data() + res.size());

+ ', plugin = "RcppEigen")

+ }

set.seed(1) # for reproducible results

m <- 100

n <- 10

A <- matrix(runif(m * n), ncol = n)

y <- runif(n)

all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE

benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications0000)

test replications elapsed relative user.self sys.self user.child sys.child

2 Eigen 100000 1.198 1.000000 1.19 0.00 0 0

1 Rcode 100000 1.890 1.577629 1.88 0.01 0 0

m <- 1000

n <- 1000

A <- matrix(runif(m * n), ncol = n)

y <- runif(n)

all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE

benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications00)

test replications elapsed relative user.self sys.self user.child sys.child

2 Eigen 1000 3.259 1.000000 3.25 0.00 0 0

1 Rcode 1000 15.648 4.801473 17.27 0.39 0 0

proc.time()

user system elapsed

31.070 1.160 30.495

-------------- next part --------------

R version 2.13.1 (2011-07-08)

Copyright (C) 2011 The R Foundation for Statistical Computing

ISBN 3-900051-07-0

Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

You are welcome to redistribute it under certain conditions.

Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.

Type 'contributors()' for more information and

'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or

'help.start()' for an HTML browser interface to help.

Type 'q()' to quit R.

library(rbenchmark)

library(inline)

library(RcppEigen)

Loading required package: Rcpp

## create an R function from Eigen-based C++ code

ff <- if (packageVersion("RcppEigen") > "0.1.1") { # development version

+ cxxfunction(signature(Xs = "matrix", ys = "numeric"), '

+ typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;

+ typedef Eigen::Map<Eigen::VectorXd> MVectorXd;

+

+ const MMatrixXd Xe(as<MMatrixXd>(Xs));

+ const MVectorXd ye(as<MVectorXd>(ys));

+ return wrap(Xe * ye);

+ ', plugin = "RcppEigen")

+ } else {

+ cxxfunction(signature(Xs = "matrix", ys = "numeric"), '

+ typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;

+ typedef Eigen::Map<Eigen::VectorXd> MVectorXd;

+

+ const NumericMatrix X(Xs);

+ const NumericVector y(ys);

+ const MMatrixXd Xe(X.begin(), X.rows(), X.cols());

+ const MVectorXd ye(y.begin(), y.size());

+ Eigen::VectorXd res = Xe * ye;

+ return NumericVector(res.data(), res.data() + res.size());

+ ', plugin = "RcppEigen")

+ }

set.seed(1) # for reproducible results

m <- 100

n <- 10

A <- matrix(runif(m * n), ncol = n)

y <- runif(n)

all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE

benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications0000)

test replications elapsed relative user.self sys.self user.child sys.child

2 Eigen 100000 0.808 1.000000 0.808 0.001 0 0

1 Rcode 100000 1.037 1.283416 1.031 0.006 0 0

m <- 1000

n <- 1000

A <- matrix(runif(m * n), ncol = n)

y <- runif(n)

all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE

benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications00)

test replications elapsed relative user.self sys.self user.child sys.child

2 Eigen 1000 1.957 1.000000 1.956 0 0 0

1 Rcode 1000 6.837 3.493613 6.835 0 0 0

proc.time()

user system elapsed

14.178 0.388 14.527