FAQ
Florian's post about mclapply got me thinking about how it is kind of a
pain to iterate over GRanges objects (since they are not Lists, there is no
lapply). Could we instead have an apply function for vectors that subsets,
i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
thinking it would be nice to have an apply function for Seqinfo objects
that would apply over the subranges of all of the sequences, where the size
of the subregion is specified by the user. Maybe call it glapply, where 'g'
is for 'genome'?

Michael

Search Discussions

  • Tim Triche, Jr. at Sep 24, 2012 at 2:16 pm
    Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?


    It seems like a sensible approach to me


    --t


    On Sep 24, 2012, at 6:53 AM, Michael Lawrence wrote:

    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
  • Cook, Malcolm at Sep 24, 2012 at 2:58 pm

    Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?

    It seems like a sensible approach to me

    Thanks Tim, I was poised to chime in...


    As Tim said, recently discussed was making pvec work with Lists (including GRangesList) where I offer:


    Here is a (better?) version that does: https://gist.github.com/3757873
    Comments? Improvements? Is it better?


    I had intended to correspond with parallel author on this matter. Is that you Michael?


    Michael, I think I also have a working version of your sblapply, more or less.


    Indeed I would not be surprised if that is what the OP really hope for (guessing here), allowing for a parallel version.


    I think I have done it in a way that supports using multicore/parallel and possibly other back ends as Vincent observed is desirable.


    I will gist it later today for consideration.


    In the meantime, I would appreciate any one to try, criticize, fix, amend my pvec redux (above).


    Cheers,


    --Malcolm

    --t
    On Sep 24, 2012, at 6:53 AM, Michael Lawrence wrote:

    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
  • Martin Morgan at Sep 24, 2012 at 4:08 pm

    On 09/24/2012 07:58 AM, Cook, Malcolm wrote:
    Did you see Malcolm Cook's post recently about fixing pvec() to automatically do this?

    It seems like a sensible approach to me
    Thanks Tim, I was poised to chime in...

    As Tim said, recently discussed was making pvec work with Lists (including GRangesList) where I offer:

    Here is a (better?) version that does: https://gist.github.com/3757873
    Comments? Improvements? Is it better?

    I had intended to correspond with parallel author on this matter. Is that you Michael?

    Michael, I think I also have a working version of your sblapply, more or less.

    Indeed I would not be surprised if that is what the OP really hope for (guessing here), allowing for a parallel version.

    I think I have done it in a way that supports using multicore/parallel and possibly other back ends as Vincent observed is desirable.

    I will gist it later today for consideration.

    In the meantime, I would appreciate any one to try, criticize, fix, amend my pvec redux (above).

    from looking at your earlier gist, I thought you'd identified two
    important distinctions -- pvec vs mclapply, and generic pvec vs. making
    pvec work with a well-defined api.


    With pvec vs mclapply (and parallelizing over GRangesList in the first
    place) it's worth keeping in mind that at least for simple operations
    the work flow unlist-update-relist is often very very fast. I'll get the
    details wrong but for instance 'disjoin' on a GRangesList is implemented as

    selectMethod(disjoin, "GRangesList")
    Method Definition:


    function (x, ...)
    {
    gr <- deconstructGRLintoGR(x)
    d <- disjoin(gr, ...)
    reconstructGRLfromGR(d, x)
    }
    <environment: namespace:GenomicRanges>


    Signatures:
    x
    target "GRangesList"


    Neither deconstructGRLintoGR nor reconstructGRLfromGR; I'm trying to
    convey the general idea rather than practical implementation advice. If
    this sounds like what you want to do, then it would be good to have that
    as a separate thread with some more details.


    With respect to pvec with well-defined API, to me the 'right' thing to
    do is to revise pvec as you suggest, but without making additional
    changes -- the minimum necessary to accomplish the goal -- and then to
    communicate on the R-devel mailing list, perhaps cc'ing Simon Urbanek.


    To do this effectively, it would be good to patch the R source -- we
    need to get dispatch right inside the package. The way to do this is
    probably


    mkdir ~/src
    cd ~/src
    svn co https://svn.r-project.org/R/trunk R-devel
    tools/rsync-recommended


    then build in a separate directory


    mkdir -p ~/bin/R-devel
    cd ~/bin/R-devel
    ~/src/R-devel/configure
    make -j


    then patch ~/src/R-devel/src/library/parallel and quickly rebuild the binary


    cd ~/bin/R-devel/src/library/parallel
    make


    and finally present the patch as


    svn diff ~/src/R-devel


    Martin

    Cheers,

    --Malcolm
    --t
    On Sep 24, 2012, at 6:53 AM, Michael Lawrence wrote:

    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    _______________________________________________
    Bioc-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel



    --
    Computational Biology / Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N.
    PO Box 19024 Seattle, WA 98109


    Location: Arnold Building M1 B861
    Phone: (206) 667-2793
  • Michael Lawrence at Sep 24, 2012 at 6:30 pm
    Sorry, I didn't mean to indicate that I was talking just about
    parallelization. The need is for a general apply function that uses [
    instead [[. This usually comes up in Vectors that are not Lists, like
    GRanges or Seqinfo. For GRangesList, lapply works as intended, and as
    Martin says, it's usually best not to apply at all. Sometimes, however, it
    is easiest to just apply. For example, if I wanted to apply over the
    chromosomes, and performing some operation based on the range of the entire
    chromosome, then for some Seqinfo of interest, I could do:

    sblapply(si, fun) or sblapply(as(si, "GenomicRanges"), fun)

    The primary use case really is something like applying over chromosomes,
    which is why I suggested the high-level glapply for traversing the whole
    genome. Of course, we want this to support parallel computing, so the
    region-size feature was to get around the uneven lengths of chromosomes,
    which make it difficult to make effective use of resources.

    Michael
    On Mon, Sep 24, 2012 at 7:16 AM, Tim Triche, Jr. wrote:

    Did you see Malcolm Cook's post recently about fixing pvec() to
    automatically do this?

    It seems like a sensible approach to me

    --t
    On Sep 24, 2012, at 6:53 AM, Michael Lawrence wrote:

    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    bioc-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
  • Tim Triche, Jr. at Sep 24, 2012 at 6:44 pm
    then call it chrapply or chromapply... nobody will ever remember sblapply,
    I don't think

    I wrote a function called byChr in regulatoR that splits up a
    SummarizedExperiment by chromosome arm, applies a function to it (hadn't
    got around to doing a nice job with pvec, so I just called mclapply()
    across the names of the split list), and then puts it back together. The
    appeal of doing it by arm is that you don't have some nodes hanging around
    waiting for chrX and chr1 after finishing chrY in a few seconds :-)

    On Mon, Sep 24, 2012 at 11:30 AM, Michael Lawrence wrote:

    Sorry, I didn't mean to indicate that I was talking just about
    parallelization. The need is for a general apply function that uses [
    instead [[. This usually comes up in Vectors that are not Lists, like
    GRanges or Seqinfo. For GRangesList, lapply works as intended, and as
    Martin says, it's usually best not to apply at all. Sometimes, however, it
    is easiest to just apply. For example, if I wanted to apply over the
    chromosomes, and performing some operation based on the range of the entire
    chromosome, then for some Seqinfo of interest, I could do:

    sblapply(si, fun) or sblapply(as(si, "GenomicRanges"), fun)

    The primary use case really is something like applying over chromosomes,
    which is why I suggested the high-level glapply for traversing the whole
    genome. Of course, we want this to support parallel computing, so the
    region-size feature was to get around the uneven lengths of chromosomes,
    which make it difficult to make effective use of resources.

    Michael

    On Mon, Sep 24, 2012 at 7:16 AM, Tim Triche, Jr. wrote:

    Did you see Malcolm Cook's post recently about fixing pvec() to
    automatically do this?

    It seems like a sensible approach to me

    --t

    On Sep 24, 2012, at 6:53 AM, Michael Lawrence <lawrence.michael@gene.com>
    wrote:
    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    bioc-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel

    --
    *A model is a lie that helps you see the truth.*
    *
    *
    Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>
  • Tim Triche, Jr. at Sep 24, 2012 at 7:04 pm
    Oops, I meant processes, not nodes. I have been running jobs on actual
    nodes and (as a bonus) not sleeping as much as I ought to lately. My
    apologies.

    So the idea of applying by chromosome arm is that individual processes will
    not be hanging around waiting after running across an ittybitty chromosome.
    If you have, say, 24 cores, and you apply per-arm, the odds are that it
    will finish a good deal faster than if you applied it per-chromosome (for
    hg19, at least).

    Naturally this is not helpful if you either don't know where the
    centromeres are, or the task at hand really does need to run over entire
    chromosomes at once.


    On Mon, Sep 24, 2012 at 11:44 AM, Tim Triche, Jr. wrote:

    then call it chrapply or chromapply... nobody will ever remember sblapply,
    I don't think

    I wrote a function called byChr in regulatoR that splits up a
    SummarizedExperiment by chromosome arm, applies a function to it (hadn't
    got around to doing a nice job with pvec, so I just called mclapply()
    across the names of the split list), and then puts it back together. The
    appeal of doing it by arm is that you don't have some nodes hanging around
    waiting for chrX and chr1 after finishing chrY in a few seconds :-)


    On Mon, Sep 24, 2012 at 11:30 AM, Michael Lawrence <
    lawrence.michael@gene.com> wrote:
    Sorry, I didn't mean to indicate that I was talking just about
    parallelization. The need is for a general apply function that uses [
    instead [[. This usually comes up in Vectors that are not Lists, like
    GRanges or Seqinfo. For GRangesList, lapply works as intended, and as
    Martin says, it's usually best not to apply at all. Sometimes, however, it
    is easiest to just apply. For example, if I wanted to apply over the
    chromosomes, and performing some operation based on the range of the entire
    chromosome, then for some Seqinfo of interest, I could do:

    sblapply(si, fun) or sblapply(as(si, "GenomicRanges"), fun)

    The primary use case really is something like applying over chromosomes,
    which is why I suggested the high-level glapply for traversing the whole
    genome. Of course, we want this to support parallel computing, so the
    region-size feature was to get around the uneven lengths of chromosomes,
    which make it difficult to make effective use of resources.

    Michael

    On Mon, Sep 24, 2012 at 7:16 AM, Tim Triche, Jr. wrote:

    Did you see Malcolm Cook's post recently about fixing pvec() to
    automatically do this?

    It seems like a sensible approach to me

    --t

    On Sep 24, 2012, at 6:53 AM, Michael Lawrence <lawrence.michael@gene.com>
    wrote:
    Florian's post about mclapply got me thinking about how it is kind of a
    pain to iterate over GRanges objects (since they are not Lists, there is no
    lapply). Could we instead have an apply function for vectors that subsets,
    i.e., uses [, instead of [[? Maybe sblapply for single bracket? I was also
    thinking it would be nice to have an apply function for Seqinfo objects
    that would apply over the subranges of all of the sequences, where the size
    of the subregion is specified by the user. Maybe call it glapply, where 'g'
    is for 'genome'?

    Michael

    [[alternative HTML version deleted]]

    _______________________________________________
    bioc-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel

    --
    *A model is a lie that helps you see the truth.*
    *
    *
    Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

    --
    *A model is a lie that helps you see the truth.*
    *
    *
    Howard Skipper<http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbioc-devel @
categoriesr
postedSep 24, '12 at 1:53p
activeSep 24, '12 at 7:04p
posts7
users4
websitebioconductor.org
irc#r

People

Translate

site design / logo © 2022 Grokbase