Grokbase Groups R r-devel July 2011
FAQ
Hello,

I am in the process of writing an R extension for parallelized MCMC, with
heavy use of compiled code (C++). I have been getting my feet wet by
implementing a simple matrix-vector multiplication function in C++ (which
calls a BLAS level 2 function dgemv), and comparing it to the '%*%' operator
in R (which apparently calls a BLAS level 3 function dgemm).

Interestingly, I cannot replicate the performance of the R native operator,
using either '.C' or '.Call'. The relative times are 17 (R), 30 (.C), and 26
(.Call). In other words, R native operator is 1.5x faster than my compiled
code. Can you explain to me why this is? Through testing I strongly suspect
that the BLAS function itself isn't what takes the bulk part of the time,
but perhaps data transfer and other overhead associated with the calls (.C
and .Call) are the main issues. Are there any ways to reach the performance
level of native R code in this case?

Thank you,
Alireza Mahani

--
View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3665017.html
Sent from the R devel mailing list archive at Nabble.com.

Search Discussions

  • Jeff Ryan at Jul 14, 2011 at 12:12 pm
    The .Call overhead isn't the issue. If you'd like some insight into what you are doing wrong (and right), you need to provide code for the list to reproduce your timings with.

    This is outlined in the posting guide as well.

    Best,
    Jeff


    On Jul 13, 2011, at 8:28 AM, asmahani wrote:

    Hello,

    I am in the process of writing an R extension for parallelized MCMC, with
    heavy use of compiled code (C++). I have been getting my feet wet by
    implementing a simple matrix-vector multiplication function in C++ (which
    calls a BLAS level 2 function dgemv), and comparing it to the '%*%' operator
    in R (which apparently calls a BLAS level 3 function dgemm).

    Interestingly, I cannot replicate the performance of the R native operator,
    using either '.C' or '.Call'. The relative times are 17 (R), 30 (.C), and 26
    (.Call). In other words, R native operator is 1.5x faster than my compiled
    code. Can you explain to me why this is? Through testing I strongly suspect
    that the BLAS function itself isn't what takes the bulk part of the time,
    but perhaps data transfer and other overhead associated with the calls (.C
    and .Call) are the main issues. Are there any ways to reach the performance
    level of native R code in this case?

    Thank you,
    Alireza Mahani

    --
    View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3665017.html
    Sent from the R devel mailing list archive at Nabble.com.

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Alireza Mahani at Jul 14, 2011 at 3:21 pm
    (I am using a LINUX machine)

    Jeff,

    In creating reproducible results, I 'partially' answered my question. I have
    attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
    files into your chosen directory, then run 'Rscript mvMultiply.r' in that
    directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
    'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
    verify that the two methods produce the same output vector.)

    Below are the results that I get, along with discussion (tR and tCall are in
    sec):

    INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
    F,F,13.536,13.875
    F,T,13.824,14.299
    T,F,13.688,18.167
    T,T,13.982,30.730

    Interpretation: The execution time for the .Call line is nearly identical to
    the call to R operator '%*%'. Two data preparation lines for the .Call
    method create the overhead:

    A <- t(A) (~12sec, or 12usec per call)
    dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)

    While the first line can be avoided by providing options in c++ function (as
    is done in the BLAS API), I wonder if the second line can be avoided, aside
    from the obvious option of rewriting the R scripts to use vectors instead of
    matrices. But this defies one of the primary advantages of using R, which is
    succinct, high-level coding. In particular, if one has several matrices as
    input into a .Call function, then the overhead from matrix-to-vector
    transformations can add up. To summarize, my questions are:

    1- Do the above results seem reasonable to you? Is there a similar penalty
    in R's '%*%' operator for transforming matrices to vectors before calling
    BLAS functions?
    2- Are there techniques for reducing the above overhead for developers
    looking to augment their R code with compiled code?

    Regards,
    Alireza

    ---------------------------------------
    # mvMultiply.r
    # comparing performance of matrix multiplication in R (using '%*%' operator)
    vs. calling compiled code (using .Call function)
    # y [m x 1] = A [m x n] %*% x [n x 1]

    rm(list = ls())
    system("R CMD SHLIB mvMultiply.cc")
    dyn.load("mvMultiply.so")

    INCLUDE_DATAPREP <- F
    ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
    column-major fashion

    m <- 100
    n <- 10
    N <- 1000000

    diffVec <- array(0, dim = N)
    tR <- 0.0
    tCall <- 0.0
    for (i in 1:N) {

    A <- runif(m*n); dim(A) <- c(m,n)
    x <- runif(n)

    t1 <- proc.time()[3]
    y1 <- A %*% x
    tR <- tR + proc.time()[3] - t1

    if (INCLUDE_DATAPREP) {
    t1 <- proc.time()[3]
    }
    if (ROWMAJOR) { #since R will convert matrix to vector in a column-major
    fashion, if the c++ function expects a row-major format, we need to
    transpose A before converting it to a vector
    A <- t(A)
    }
    dim(A) <- dim(A)[1] * dim(A)[2]
    if (!INCLUDE_DATAPREP) {
    t1 <- proc.time()[3]
    }
    y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
    as.logical(c(ROWMAJOR)))
    tCall <- tCall + proc.time()[3] - t1

    diffVec[i] <- max(abs(y2 - y1))
    }
    cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
    cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
    cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
    cat("Time - Using '.Call' function: ", tCall, "sec\n")
    cat("Maximum difference between methods: ", max(diffVec), "\n")

    dyn.unload("mvMultiply.so")
    ---------------------------------------
    # mvMultiply.cc
    #include <Rinternals.h>
    #include <R.h>

    extern "C" {

    SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
    double *rA = REAL(A), *rx = REAL(x), *ry;
    int *rrm = LOGICAL(rowmajor);
    int n = length(x);
    int m = length(A) / n;
    SEXP y;
    PROTECT(y = allocVector(REALSXP, m));
    ry = REAL(y);
    for (int i = 0; i < m; i++) {
    ry[i] = 0.0;
    for (int j = 0; j < n; j++) {
    if (rrm[0] == 1) {
    ry[i] += rA[i * n + j] * rx[j];
    } else {
    ry[i] += rA[j * m + i] * rx[j];
    }
    }
    }
    UNPROTECT(1);
    return(y);
    }

    }


    --
    View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3667896.html
    Sent from the R devel mailing list archive at Nabble.com.
  • Gabriel Becker at Jul 14, 2011 at 3:59 pm
    On Thu, Jul 14, 2011 at 8:21 AM, Alireza Mahani
    (I am using a LINUX machine)
    Jeff,

    In creating reproducible results, I 'partially' answered my question. I
    have
    attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
    files into your chosen directory, then run 'Rscript mvMultiply.r' in that
    directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
    'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
    verify that the two methods produce the same output vector.)

    Below are the results that I get, along with discussion (tR and tCall are
    in
    sec):

    INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
    F,F,13.536,13.875
    F,T,13.824,14.299
    T,F,13.688,18.167
    T,T,13.982,30.730

    Interpretation: The execution time for the .Call line is nearly identical
    to
    the call to R operator '%*%'. Two data preparation lines for the .Call
    method create the overhead:

    A <- t(A) (~12sec, or 12usec per call)
    dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)
    AFAIK R stores matrices as vectors internally anyway and the dims just tell
    it the position of the various elements, so I'm not sure that second line is
    needed at all.

    I have attached a tiny piece of c code which verifies this.

    The output I get from that is:
    dyn.load("/home/gmbecker/gabe/matvectest.so")
    vec = 1.1:8.1
    mat = matrix(vec, ncol = 4)
    .Call("R_MatVecTest", vec, mat, 8L)
    [1] TRUE


    Note if you create the matrix with byrow=TRUE they may not be the same.

    Hope that helps,
    Gabe

    While the first line can be avoided by providing options in c++ function
    (as
    is done in the BLAS API), I wonder if the second line can be avoided, aside
    from the obvious option of rewriting the R scripts to use evectors instead
    of
    matrices. But this defies one of the primary advantages of using R, which
    is
    succinct, high-level coding. In particular, if one has several matrices as
    input into a .Call function, then the overhead from matrix-to-vector
    transformations can add up. To summarize, my questions are:

    1- Do the above results seem reasonable to you? Is there a similar penalty
    in R's '%*%' operator for transforming matrices to vectors before calling
    BLAS functions?
    2- Are there techniques for reducing the above overhead for developers
    looking to augment their R code with compiled code?

    Regards,
    Alireza

    ---------------------------------------
    # mvMultiply.r
    # comparing performance of matrix multiplication in R (using '%*%'
    operator)
    vs. calling compiled code (using .Call function)
    # y [m x 1] = A [m x n] %*% x [n x 1]

    rm(list = ls())
    system("R CMD SHLIB mvMultiply.cc")
    dyn.load("mvMultiply.so")

    INCLUDE_DATAPREP <- F
    ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major
    or
    column-major fashion

    m <- 100
    n <- 10
    N <- 1000000

    diffVec <- array(0, dim = N)
    tR <- 0.0
    tCall <- 0.0
    for (i in 1:N) {

    A <- runif(m*n); dim(A) <- c(m,n)
    x <- runif(n)

    t1 <- proc.time()[3]
    y1 <- A %*% x
    tR <- tR + proc.time()[3] - t1

    if (INCLUDE_DATAPREP) {
    t1 <- proc.time()[3]
    }
    if (ROWMAJOR) { #since R will convert matrix to vector in a
    column-major
    fashion, if the c++ function expects a row-major format, we need to
    transpose A before converting it to a vector
    A <- t(A)
    }
    dim(A) <- dim(A)[1] * dim(A)[2]
    if (!INCLUDE_DATAPREP) {
    t1 <- proc.time()[3]
    }
    y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
    as.logical(c(ROWMAJOR)))
    tCall <- tCall + proc.time()[3] - t1

    diffVec[i] <- max(abs(y2 - y1))
    }
    cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
    cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
    cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
    cat("Time - Using '.Call' function: ", tCall, "sec\n")
    cat("Maximum difference between methods: ", max(diffVec), "\n")

    dyn.unload("mvMultiply.so")
    ---------------------------------------
    # mvMultiply.cc
    #include <Rinternals.h>
    #include <R.h>

    extern "C" {

    SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
    double *rA = REAL(A), *rx = REAL(x), *ry;
    int *rrm = LOGICAL(rowmajor);
    int n = length(x);
    int m = length(A) / n;
    SEXP y;
    PROTECT(y = allocVector(REALSXP, m));
    ry = REAL(y);
    for (int i = 0; i < m; i++) {
    ry[i] = 0.0;
    for (int j = 0; j < n; j++) {
    if (rrm[0] == 1) {
    ry[i] += rA[i * n + j] * rx[j];
    } else {
    ry[i] += rA[j * m + i] * rx[j];
    }
    }
    }
    UNPROTECT(1);
    return(y);
    }

    }


    --
    View this message in context:
    http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3667896.html
    Sent from the R devel mailing list archive at Nabble.com.

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel


    --
    Gabriel Becker
    Graduate Student
    Statistics Department
    University of California, Davis
  • Alireza Mahani at Jul 14, 2011 at 4:18 pm
    You are absolutely right Gabe! I removed the line 'dim(A) <- dim(A)[1] *
    dim(A)[2]' and my code still executes properly. As you said, matrices are
    internally stored as one-dimensional arrays (column-major by default), it's
    just that R exposes them differently by assigning different attributes to
    them, but as far as C/C++ code is concerned there is no distinction.

    Many thanks,
    Alireza


    --
    View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3668047.html
    Sent from the R devel mailing list archive at Nabble.com.
  • Douglas Bates at Jul 19, 2011 at 3:00 pm

    On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani wrote:
    (I am using a LINUX machine)

    Jeff,

    In creating reproducible results, I 'partially' answered my question. I have
    attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
    files into your chosen directory, then run 'Rscript mvMultiply.r' in that
    directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
    'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
    verify that the two methods produce the same output vector.)

    Below are the results that I get, along with discussion (tR and tCall are in
    sec):

    INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
    F,F,13.536,13.875
    F,T,13.824,14.299
    T,F,13.688,18.167
    T,T,13.982,30.730

    Interpretation: The execution time for the .Call line is nearly identical to
    the call to R operator '%*%'. Two data preparation lines for the .Call
    method create the overhead:

    A <- t(A) (~12sec, or 12usec per call)
    dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)

    While the first line can be avoided by providing options in c++ function (as
    is done in the BLAS API), I wonder if the second line can be avoided, aside
    from the obvious option of rewriting the R scripts to use vectors instead of
    matrices. But this defies one of the primary advantages of using R, which is
    succinct, high-level coding. In particular, if one has several matrices as
    input into a .Call function, then the overhead from matrix-to-vector
    transformations can add up. To summarize, my questions are:

    1- Do the above results seem reasonable to you? Is there a similar penalty
    in R's '%*%' operator for transforming matrices to vectors before calling
    BLAS functions?
    2- Are there techniques for reducing the above overhead for developers
    looking to augment their R code with compiled code?

    Regards,
    Alireza

    ---------------------------------------
    # mvMultiply.r
    # comparing performance of matrix multiplication in R (using '%*%' operator)
    vs. calling compiled code (using .Call function)
    # y [m x 1] = A [m x n] %*% x [n x 1]

    rm(list = ls())
    system("R CMD SHLIB mvMultiply.cc")
    dyn.load("mvMultiply.so")

    INCLUDE_DATAPREP <- F
    ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
    column-major fashion

    m <- 100
    n <- 10
    N <- 1000000

    diffVec <- array(0, dim = N)
    tR <- 0.0
    tCall <- 0.0
    for (i in 1:N) {

    ? ? ? ?A <- runif(m*n); dim(A) <- c(m,n)
    ? ? ? ?x <- runif(n)

    ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?y1 <- A %*% x
    ? ? ? ?tR <- tR + proc.time()[3] - t1

    ? ? ? ?if (INCLUDE_DATAPREP) {
    ? ? ? ? ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?}
    ? ? ? ?if (ROWMAJOR) { #since R will convert matrix to vector in a column-major
    fashion, if the c++ function expects a row-major format, we need to
    transpose A before converting it to a vector
    ? ? ? ? ? ? ? ?A <- t(A)
    ? ? ? ?}
    ? ? ? ?dim(A) <- dim(A)[1] * dim(A)[2]
    ? ? ? ?if (!INCLUDE_DATAPREP) {
    ? ? ? ? ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?}
    ? ? ? ?y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
    as.logical(c(ROWMAJOR)))
    ? ? ? ?tCall <- tCall + proc.time()[3] - t1

    ? ? ? ?diffVec[i] <- max(abs(y2 - y1))
    }
    cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
    cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
    cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
    cat("Time - Using '.Call' function: ", tCall, "sec\n")
    cat("Maximum difference between methods: ", max(diffVec), "\n")

    dyn.unload("mvMultiply.so")
    ---------------------------------------
    # mvMultiply.cc
    #include <Rinternals.h>
    #include <R.h>

    extern "C" {

    SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
    ? ? ? ?double *rA = REAL(A), *rx = REAL(x), *ry;
    ? ? ? ?int *rrm = LOGICAL(rowmajor);
    ? ? ? ?int n = length(x);
    ? ? ? ?int m = length(A) / n;
    ? ? ? ?SEXP y;
    ? ? ? ?PROTECT(y = allocVector(REALSXP, m));
    ? ? ? ?ry = REAL(y);
    ? ? ? ?for (int i = 0; i < m; i++) {
    ? ? ? ? ? ? ? ?ry[i] = 0.0;
    ? ? ? ? ? ? ? ?for (int j = 0; j < n; j++) {
    ? ? ? ? ? ? ? ? ? ? ? ?if (rrm[0] == 1) {
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[i * n + j] * rx[j];
    ? ? ? ? ? ? ? ? ? ? ? ?} else {
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[j * m + i] * rx[j];
    ? ? ? ? ? ? ? ? ? ? ? ?}
    ? ? ? ? ? ? ? ?}
    ? ? ? ?}
    ? ? ? ?UNPROTECT(1);
    ? ? ? ?return(y);
    }

    }
    I realize that you are just beginning to use the .C and .Call
    interfaces and your example is therefore a simple one. However, if
    you plan to continue with such development it is worthwhile learning
    of some of the tools available. I think one of the most important is
    the "inline" package that can take a C or C++ code segment as a text
    string and go through all the steps of creating and loading a
    .Call'able compiled function.

    Second, if you are going to use C++ (the code you show could be C code
    as it doesn't use any C++ extensions) then you should look at the Rcpp
    package written by Dirk Eddelbuettel and Romain Francois which allows
    for comparatively painless interfacing of R objects and C++ objects.
    The Rcpp-devel list, which I have copied on this reply, is for
    questions related to that system. The inline package allows for
    various "plugin" constructions to wrap your code in the appropriate
    headers and point the compiler to the locations of header files and
    libraries. There are two extensions to Rcpp for numerical linear
    algebra in C++, RcppArmadillo and RcppEigen. I show the use of
    RcppEigen here.

    Third there are several packages in R that do the busy work of
    benchmarking expressions and neatly formulating the results. I use
    the rbenchmark package.

    Putting all these together yields the enclosed script and results.

    In Eigen, a MatrixXd object is the equivalent of R's numeric matrix
    (similarly MatrixXi for integer and MatrixXcd for complex) and a
    VectorXd object is the equivalent of a numeric vector. A "mapped"
    matrix or vector is one that uses the storage allocated by R, thereby
    avoiding a copy operation (similar to your accessing elements of the
    arrays through the pointer returned by REAL()). To adhere to R's
    functional programming semantics it is a good idea to declare such
    objects as const. The 'as' and 'wrap' functions are provided by Rcpp
    with extensions in RcppEigen to the Eigen classes in the development
    version. In the released versions of Rcpp and RcppEigen we use
    intermediate Rcpp objects. These functions have the advantage of
    checking the types of R objects being passed. The Eigen code for
    matrix multiplication will check the consistency of dimensions in the
    operation.

    Given the size of the matrix you are working with it is not surprising
    that interpretation overhead and checking will be a large part of the
    elapsed time, hence the relative differences between different methods
    of doing the numerical calculation will be small. The operation of
    multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000
    floating point operations. Furthermore, each element of the matrix is
    used only once so sophisticated methods of manipulating cache contents
    won't buy you much. These benchmark results are from a system that
    uses Atlas BLAS (basic linear algebra subroutines); other systems will
    provide different results. Interestingly, I found on some systems
    using R's BLAS, which are not accelerated, the R code is closer in
    speed to the code using Eigen. An example is given in the second
    version of the output.
    -------------- next part --------------

    R Under development (unstable) (2011-07-19 r56429)
    Copyright (C) 2011 The R Foundation for Statistical Computing
    ISBN 3-900051-07-0
    Platform: x86_64-unknown-linux-gnu (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.

    Natural language support but running in an English locale

    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    library(rbenchmark)
    library(inline)
    library(RcppEigen)
    Loading required package: Rcpp
    ## create an R function from Eigen-based C++ code
    ff <- if (packageVersion("RcppEigen") > "0.1.1") { # development version
    + cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
    + typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;
    + typedef Eigen::Map<Eigen::VectorXd> MVectorXd;
    +
    + const MMatrixXd Xe(as<MMatrixXd>(Xs));
    + const MVectorXd ye(as<MVectorXd>(ys));
    + return wrap(Xe * ye);
    + ', plugin = "RcppEigen")
    + } else {
    + cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
    + typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;
    + typedef Eigen::Map<Eigen::VectorXd> MVectorXd;
    +
    + const NumericMatrix X(Xs);
    + const NumericVector y(ys);
    + const MMatrixXd Xe(X.rows(), X.cols(), X.begin());
    + const MVectorXd ye(y.size(), y.begin());
    + Eigen::VectorXd res = Xe * ye;
    + return NumericVector(res.data(), res.data() + res.size());
    + ', plugin = "RcppEigen")
    + }
    set.seed(1) # for reproducible results
    m <- 100
    n <- 10
    A <- matrix(runif(m * n), ncol = n)
    y <- runif(n)
    all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE
    benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications0000)
    test replications elapsed relative user.self sys.self user.child sys.child
    2 Eigen 100000 1.198 1.000000 1.19 0.00 0 0
    1 Rcode 100000 1.890 1.577629 1.88 0.01 0 0
    m <- 1000
    n <- 1000
    A <- matrix(runif(m * n), ncol = n)
    y <- runif(n)
    all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE
    benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications00)
    test replications elapsed relative user.self sys.self user.child sys.child
    2 Eigen 1000 3.259 1.000000 3.25 0.00 0 0
    1 Rcode 1000 15.648 4.801473 17.27 0.39 0 0
    proc.time()
    user system elapsed
    31.070 1.160 30.495
    -------------- next part --------------

    R version 2.13.1 (2011-07-08)
    Copyright (C) 2011 The R Foundation for Statistical Computing
    ISBN 3-900051-07-0
    Platform: x86_64-unknown-linux-gnu (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.

    Natural language support but running in an English locale

    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.
    library(rbenchmark)
    library(inline)
    library(RcppEigen)
    Loading required package: Rcpp
    ## create an R function from Eigen-based C++ code
    ff <- if (packageVersion("RcppEigen") > "0.1.1") { # development version
    + cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
    + typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;
    + typedef Eigen::Map<Eigen::VectorXd> MVectorXd;
    +
    + const MMatrixXd Xe(as<MMatrixXd>(Xs));
    + const MVectorXd ye(as<MVectorXd>(ys));
    + return wrap(Xe * ye);
    + ', plugin = "RcppEigen")
    + } else {
    + cxxfunction(signature(Xs = "matrix", ys = "numeric"), '
    + typedef Eigen::Map<Eigen::MatrixXd> MMatrixXd;
    + typedef Eigen::Map<Eigen::VectorXd> MVectorXd;
    +
    + const NumericMatrix X(Xs);
    + const NumericVector y(ys);
    + const MMatrixXd Xe(X.begin(), X.rows(), X.cols());
    + const MVectorXd ye(y.begin(), y.size());
    + Eigen::VectorXd res = Xe * ye;
    + return NumericVector(res.data(), res.data() + res.size());
    + ', plugin = "RcppEigen")
    + }
    set.seed(1) # for reproducible results
    m <- 100
    n <- 10
    A <- matrix(runif(m * n), ncol = n)
    y <- runif(n)
    all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE
    benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications0000)
    test replications elapsed relative user.self sys.self user.child sys.child
    2 Eigen 100000 0.808 1.000000 0.808 0.001 0 0
    1 Rcode 100000 1.037 1.283416 1.031 0.006 0 0
    m <- 1000
    n <- 1000
    A <- matrix(runif(m * n), ncol = n)
    y <- runif(n)
    all.equal(as.vector(A %*% y), ff(A, y)) [1] TRUE
    benchmark(Rcode = expression(A %*% y), Eigen = ff(A, y), replications00)
    test replications elapsed relative user.self sys.self user.child sys.child
    2 Eigen 1000 1.957 1.000000 1.956 0 0 0
    1 Rcode 1000 6.837 3.493613 6.835 0 0 0
    proc.time()
    user system elapsed
    14.178 0.388 14.527
  • Douglas Bates at Jul 19, 2011 at 3:09 pm
    I just saw that I left a syntax error in the .R and the first
    _Rout.txt files. Notice that in the second _Rout.txt file the order
    of the arguments in the constructors for the MMatrixXd and the
    MVectorXd are in a different order than in the .R and the first
    _Rout.txt files. The correct order has the pointer first, then the
    dimensions. For the first _Rout.txt file this part of the code is not
    used.
    On Tue, Jul 19, 2011 at 10:00 AM, Douglas Bates wrote:
    On Thu, Jul 14, 2011 at 10:21 AM, Alireza Mahani
    wrote:
    (I am using a LINUX machine)

    Jeff,

    In creating reproducible results, I 'partially' answered my question. I have
    attached two scripts, 'mvMultiply.r' and 'mvMultiply.cc'. Please copy both
    files into your chosen directory, then run 'Rscript mvMultiply.r' in that
    directory while changing the two boolean parameters 'INCLUDE_DATAPREP' and
    'ROWMAJOR' to all four permutations. (The variable 'diffVec' is there to
    verify that the two methods produce the same output vector.)

    Below are the results that I get, along with discussion (tR and tCall are in
    sec):

    INCLUDE_DATAPREP,ROWMAJOR,tR,tCall
    F,F,13.536,13.875
    F,T,13.824,14.299
    T,F,13.688,18.167
    T,T,13.982,30.730

    Interpretation: The execution time for the .Call line is nearly identical to
    the call to R operator '%*%'. Two data preparation lines for the .Call
    method create the overhead:

    A <- t(A) (~12sec, or 12usec per call)
    dim(A) <- dim(A)[1] * dim(A)[2] (~4sec, or 4usec per call)

    While the first line can be avoided by providing options in c++ function (as
    is done in the BLAS API), I wonder if the second line can be avoided, aside
    from the obvious option of rewriting the R scripts to use vectors instead of
    matrices. But this defies one of the primary advantages of using R, which is
    succinct, high-level coding. In particular, if one has several matrices as
    input into a .Call function, then the overhead from matrix-to-vector
    transformations can add up. To summarize, my questions are:

    1- Do the above results seem reasonable to you? Is there a similar penalty
    in R's '%*%' operator for transforming matrices to vectors before calling
    BLAS functions?
    2- Are there techniques for reducing the above overhead for developers
    looking to augment their R code with compiled code?

    Regards,
    Alireza

    ---------------------------------------
    # mvMultiply.r
    # comparing performance of matrix multiplication in R (using '%*%' operator)
    vs. calling compiled code (using .Call function)
    # y [m x 1] = A [m x n] %*% x [n x 1]

    rm(list = ls())
    system("R CMD SHLIB mvMultiply.cc")
    dyn.load("mvMultiply.so")

    INCLUDE_DATAPREP <- F
    ROWMAJOR <- F #indicates whether the c++ function treats A in a row-major or
    column-major fashion

    m <- 100
    n <- 10
    N <- 1000000

    diffVec <- array(0, dim = N)
    tR <- 0.0
    tCall <- 0.0
    for (i in 1:N) {

    ? ? ? ?A <- runif(m*n); dim(A) <- c(m,n)
    ? ? ? ?x <- runif(n)

    ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?y1 <- A %*% x
    ? ? ? ?tR <- tR + proc.time()[3] - t1

    ? ? ? ?if (INCLUDE_DATAPREP) {
    ? ? ? ? ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?}
    ? ? ? ?if (ROWMAJOR) { #since R will convert matrix to vector in a column-major
    fashion, if the c++ function expects a row-major format, we need to
    transpose A before converting it to a vector
    ? ? ? ? ? ? ? ?A <- t(A)
    ? ? ? ?}
    ? ? ? ?dim(A) <- dim(A)[1] * dim(A)[2]
    ? ? ? ?if (!INCLUDE_DATAPREP) {
    ? ? ? ? ? ? ? ?t1 <- proc.time()[3]
    ? ? ? ?}
    ? ? ? ?y2 <- .Call("matvecMultiply", as.double(A), as.double(x),
    as.logical(c(ROWMAJOR)))
    ? ? ? ?tCall <- tCall + proc.time()[3] - t1

    ? ? ? ?diffVec[i] <- max(abs(y2 - y1))
    }
    cat("Data prep time for '.Call' included: ", INCLUDE_DATAPREP, "\n")
    cat("C++ function expects row-major matrix: ", ROWMAJOR, "\n")
    cat("Time - Using '%*%' operator in R: ", tR, "sec\n")
    cat("Time - Using '.Call' function: ", tCall, "sec\n")
    cat("Maximum difference between methods: ", max(diffVec), "\n")

    dyn.unload("mvMultiply.so")
    ---------------------------------------
    # mvMultiply.cc
    #include <Rinternals.h>
    #include <R.h>

    extern "C" {

    SEXP matvecMultiply(SEXP A, SEXP x, SEXP rowmajor) {
    ? ? ? ?double *rA = REAL(A), *rx = REAL(x), *ry;
    ? ? ? ?int *rrm = LOGICAL(rowmajor);
    ? ? ? ?int n = length(x);
    ? ? ? ?int m = length(A) / n;
    ? ? ? ?SEXP y;
    ? ? ? ?PROTECT(y = allocVector(REALSXP, m));
    ? ? ? ?ry = REAL(y);
    ? ? ? ?for (int i = 0; i < m; i++) {
    ? ? ? ? ? ? ? ?ry[i] = 0.0;
    ? ? ? ? ? ? ? ?for (int j = 0; j < n; j++) {
    ? ? ? ? ? ? ? ? ? ? ? ?if (rrm[0] == 1) {
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[i * n + j] * rx[j];
    ? ? ? ? ? ? ? ? ? ? ? ?} else {
    ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ry[i] += rA[j * m + i] * rx[j];
    ? ? ? ? ? ? ? ? ? ? ? ?}
    ? ? ? ? ? ? ? ?}
    ? ? ? ?}
    ? ? ? ?UNPROTECT(1);
    ? ? ? ?return(y);
    }

    }
    I realize that you are just beginning to use the .C and .Call
    interfaces and your example is therefore a simple one. ?However, if
    you plan to continue with such development it is worthwhile learning
    of some of the tools available. ?I think one of the most important is
    the "inline" package that can take a C or C++ code segment as a text
    string and go through all the steps of creating and loading a
    .Call'able compiled function.

    Second, if you are going to use C++ (the code you show could be C code
    as it doesn't use any C++ extensions) then you should look at the Rcpp
    package written by Dirk Eddelbuettel and Romain Francois which allows
    for comparatively painless interfacing of R objects and C++ objects.
    The Rcpp-devel list, which I have copied on this reply, is for
    questions related to that system. ?The inline package allows for
    various "plugin" constructions to wrap your code in the appropriate
    headers and point the compiler to the locations of header files and
    libraries. ?There are two extensions to Rcpp for numerical linear
    algebra in C++, RcppArmadillo and RcppEigen. ?I show the use of
    RcppEigen here.

    Third there are several packages in R that do the busy work of
    benchmarking expressions and neatly formulating the results. ?I use
    the rbenchmark package.

    Putting all these together yields the enclosed script and results.

    In Eigen, a MatrixXd object is the equivalent of R's numeric matrix
    (similarly MatrixXi for integer and MatrixXcd for complex) and a
    VectorXd object is the equivalent of a numeric vector. ?A "mapped"
    matrix or vector is one that uses the storage allocated by R, thereby
    avoiding a copy operation (similar to your accessing elements of the
    arrays through the pointer returned by REAL()). ?To adhere to R's
    functional programming semantics it is a good idea to declare such
    objects as const. ?The 'as' and 'wrap' functions are provided by Rcpp
    with extensions in RcppEigen to the Eigen classes in the development
    version. ?In the released versions of Rcpp and RcppEigen we use
    intermediate Rcpp objects. These functions have the advantage of
    checking the types of R objects being passed. ?The Eigen code for
    matrix multiplication will check the consistency of dimensions in the
    operation.

    Given the size of the matrix you are working with it is not surprising
    that interpretation overhead and checking will be a large part of the
    elapsed time, hence the relative differences between different methods
    of doing the numerical calculation will be small. ?The operation of
    multiplying a 100 x 10 matrix by a 10-vector involves "only" 1000
    floating point operations. ?Furthermore, each element of the matrix is
    used only once so sophisticated methods of manipulating cache contents
    won't buy you much. ?These benchmark results are from a system that
    uses Atlas BLAS (basic linear algebra subroutines); other systems will
    provide different results. ?Interestingly, I found on some systems
    using R's BLAS, which are not accelerated, the R code is closer in
    speed to the code using Eigen. ?An example is given in the second
    version of the output.
  • Alireza Mahani at Jul 19, 2011 at 7:13 pm
    Prof. Bates,

    It looks like you read my mind! I am working on writing an R package for
    high-performance MCMC estimation of a class of Hierarchical Bayesian models
    most often used in the field of quantitative marketing. This would
    essentially be a parallelized version of Peter Rossi's bayesm package. While
    I've made great progress in parallelizing the most mathematically difficult
    part of the algorithm, namely slice sampling of low-level coefficients, yet
    I've realized that putting the entire code together while minimizing bugs is
    a big challenge in C/C++/CUDA environments. I have therefore decided to
    follow a more logical path of first developing the code logic in R, and then
    exporting it function by function to compiled code.

    The tools that you mentioned seem to be exactly the kind of stuff I need in
    order to be able to do go through this incremental, test-oriented
    development process with relatively little pain.

    I'm not sure if this is what you had in mind while suggesting the tools to
    me, so please let me know if I'm misinterpreting your comments, or if I need
    to be aware of other tools beyond what you mentioned.

    Many thanks,
    Alireza


    --
    View this message in context: http://r.789695.n4.nabble.com/Performance-of-C-and-Call-functions-vs-native-R-code-tp3665017p3679056.html
    Sent from the R devel mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-devel @
categoriesr
postedJul 13, '11 at 1:28p
activeJul 19, '11 at 7:13p
posts8
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase