Grokbase Groups R r-help March 2002
FAQ
If anyone has a very fast vectorized method for doing the following I would appreciate some help. I want to avoid outer() to limit memory problems for very large n.

Let

x = real vector of length n
y = real vector of length n
w = real vector of length m, m typically less than n/2 but can be > n
z = real vector of length m

For w[i], i=1,,,m, find the value of x that is closest to w[i]. In the case of ties, select one (optimally at random or just take the first match). Let z[i] = value of y corresponding to the closest x.

If there is a Fortran or C function (builtin or otherwise) that does much of the work, all the better.

Thanks,

Frank
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Search Discussions

  • Peter Dalgaard BSA at Mar 28, 2002 at 1:50 pm

    Frank E Harrell Jr <fharrell@virginia.edu> writes:

    If anyone has a very fast vectorized method for doing the following I would appreciate some help. I want to avoid outer() to limit memory problems for very large n.

    Let

    x = real vector of length n
    y = real vector of length n
    w = real vector of length m, m typically less than n/2 but can be > n
    z = real vector of length m

    For w[i], i=1,,,m, find the value of x that is closest to w[i]. In the case of ties, select one (optimally at random or just take the first match). Let z[i] = value of y corresponding to the closest x.

    If there is a Fortran or C function (builtin or otherwise) that does much of the work, all the better.
    Sounds rather close to what approx() does in the stepfunction cases,
    except that you want jumps at the midpoints between the x values
    rather than at the x themselves.

    --
    O__ ---- Peter Dalgaard Blegdamsvej 3
    c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
    (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
    ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Thomas Lumley at Mar 28, 2002 at 4:54 pm

    On Thu, 28 Mar 2002, Frank E Harrell Jr wrote:

    If anyone has a very fast vectorized method for doing the following I
    would appreciate some help. I want to avoid outer() to limit memory
    problems for very large n.

    Let

    x = real vector of length n
    y = real vector of length n
    w = real vector of length m, m typically less than n/2 but can be > n
    z = real vector of length m

    For w[i], i=1,,,m, find the value of x that is closest to w[i]. In the
    case of ties, select one (optimally at random or just take the first
    match). Let z[i] = value of y corresponding to the closest x.
    I believe the following will work in pre1.5.0 using the new findInterval
    function (modulo any remaining off-by-one errors that should be easy to
    fix)

    # sort them
    oo<-order(x)
    x<-x[oo]
    y<-y[oo]
    # find the right interval for w
    j<-pmax(1,findInterval(w,x))
    xlow<-x[j] #or possibly j-1 and j or something
    xhi<-x[j+1]
    # which end of the interval?
    bestj<-ifelse(xhi-w<w-xlow,j+1,j)
    # get z
    z<-y[j]

    For length(x)^5 it takes 0.5s with length(w)=length(x)/3 and 2.5s with
    length(w)=length(x)*3 on my machine with a current pre1.5.0

    -thomas


    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Mike Lonergan at Mar 28, 2002 at 4:55 pm
    Frank,

    Apologies if I'm being very stupid, but I'd guess that reducing the overall
    complexity would help for large datasets. The following is not vectorised,
    but should be linear apart from the order() operations, which are hopefully
    O(nlogn):

    closest<-function(w,x,y) {
    ow<-order(w)
    ox<-order(x)
    ws<-w[ow]
    xs<-x[ox]
    ys<-y[ox]
    res<-rep(length(x),length(w))
    i<-1
    j<-1

    while(i<=length(w) && j<length(x)) {
    if ( ws[i]<=(xs[j]+xs[j+1])/2 ) {
    res[i]<-j
    i<-i+1
    } else
    j<-j+1
    }
    reshuffle<-order((1:length(w))[ow])
    z<-ys[res[reshuffle]]
    return(z)
    }


    It is all based on sorting the data first so that only m+n comparisons are
    needed rather than mn.

    I'd be glad to know whether I've messed up/completely misunderstood the
    problem (& I've just realised you're more interested in space than time so
    this may all be completely irrelevant.)

    Cheers,

    Mike.

    -----Original Message-----
    From: owner-r-help at stat.math.ethz.ch
    [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Frank E
    Harrell Jr
    Sent: 28 March 2002 13:20
    To: rhelp
    Subject: [R] Vectorizing closest match
    >
    >
    If anyone has a very fast vectorized method for doing the
    following I would appreciate some help. I want to avoid
    outer() to limit memory problems for very large n. >
    Let >
    x = real vector of length n
    y = real vector of length n
    w = real vector of length m, m typically less than n/2 but can be > n
    z = real vector of length m >
    For w[i], i=1,,,m, find the value of x that is closest to
    w[i]. In the case of ties, select one (optimally at random
    or just take the first match). Let z[i] = value of y
    corresponding to the closest x. >
    If there is a Fortran or C function (builtin or otherwise)
    that does much of the work, all the better. >
    Thanks, >
    Frank
    --
    Frank E Harrell Jr Prof. of Biostatistics & Statistics
    Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
    U. Virginia School of Medicine
    http://hesweb1.med.virginia.edu/biostat
    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
    -.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
    _._

    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Thomas Lumley at Mar 28, 2002 at 5:04 pm

    On Thu, 28 Mar 2002, Mike Lonergan wrote:

    Frank,

    Apologies if I'm being very stupid, but I'd guess that reducing the overall
    complexity would help for large datasets. The following is not vectorised,
    but should be linear apart from the order() operations, which are hopefully
    O(nlogn):
    <snip actual code>
    It is all based on sorting the data first so that only m+n comparisons are
    needed rather than mn.
    Looking at how findInterval is defined it seems that my solution is
    roughly O(m logn) if n>>m but roughly O(m+n) if n<=m, so there's not that
    much to choose between them in asymptotic complexity.

    -thomas

    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedMar 28, '02 at 1:19p
activeMar 28, '02 at 5:04p
posts5
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase