FAQ
Dear All,

I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each.
I was exploring the performance of both EdgeR and DeSeq and I noticed they
vary a lot on the dispersion of the normalization factors.
Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2
while if I use DeSeq estimateSizeFactors the distribution varies from
0.4-1.7. Given that these are exactly the same libraries
why do the estimates vary so much? How will that impact the list of DEgenes?
I know that the calculations are not performed in the same way, but aren't
those two functions aimed at estimating the same phenomenon?

thanks for your help.

--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania

"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi

Search Discussions

  • Simon Anders at Mar 15, 2013 at 6:58 pm
    Hi Lucia

    On 15/03/13 16:43, Lucia Peixoto wrote:
    I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each.
    I was exploring the performance of both EdgeR and DeSeq and I noticed they
    vary a lot on the dispersion of the normalization factors.
    Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2
    while if I use DeSeq estimateSizeFactors the distribution varies from
    0.4-1.7. Given that these are exactly the same libraries
    why do the estimates vary so much? How will that impact the list of DEgenes?
    I know that the calculations are not performed in the same way, but aren't
    those two functions aimed at estimating the same phenomenon?

    EdgeR's library factors are relative to the total read count, and
    DESeq's aren't. Do, if you want to compare them, you have to multiply
    the factors from edgeR with the total read counts and divide by some
    suitable big number.


    So, if sf is vector of size factors from DESeq, nf is a vector of
    normalization factors from edgeR, and rs is the vector with the column
    sums of the count matrix, I would expect that


    plot( sf, rs * nm )


    gives a plot with the points lying roughly on a straight line.


    Simon
  • Ryan C. Thompson at Mar 15, 2013 at 10:43 pm
    If you use the "getOffset" function for your DGEList object and the
    following function for your CountDataSet object, you will get offset
    values that are directly comparable:


    library(DESeq)
    library(edgeR)
    library(ggplot2)
    getOffset.CountDataSet <- function(y) {
    if (any(is.na(sizeFactors(y))))
    stop("Call estimateSizeFactors first")
    log(sizeFactors(y)) - mean(log(sizeFactors(y))) +
    mean(log(colSums(counts(y))))
    }
    cds <- makeExampleCountDataSet()
    cds <- estimateSizeFactors(cds)
    dge <- DGEList(counts=counts(cds), group=pData(cds)$condition)
    dge <- calcNormFactors(dge)


    qplot(x=getOffset(dge), y=getOffset.CountDataSet(cds)) +
    labs(title="Offsets, DESeq vs edgeR",
    x="edgeR offset", y="DESeq offset") +
    coord_equal() +
    geom_abline(slope=1, intercept=0)



    On Fri 15 Mar 2013 11:58:11 AM PDT, Simon Anders wrote:
    Hi Lucia
    On 15/03/13 16:43, Lucia Peixoto wrote:
    I am currently analyzing an RNASeq dataset, I have 3 samples with n=4
    each.
    I was exploring the performance of both EdgeR and DeSeq and I noticed
    they
    vary a lot on the dispersion of the normalization factors.
    Using EdgeR calcNormFactors I get a distribution that varies from
    0.9-1.2
    while if I use DeSeq estimateSizeFactors the distribution varies from
    0.4-1.7. Given that these are exactly the same libraries
    why do the estimates vary so much? How will that impact the list of
    DEgenes?
    I know that the calculations are not performed in the same way, but
    aren't
    those two functions aimed at estimating the same phenomenon?
    EdgeR's library factors are relative to the total read count, and
    DESeq's aren't. Do, if you want to compare them, you have to multiply
    the factors from edgeR with the total read counts and divide by some
    suitable big number.

    So, if sf is vector of size factors from DESeq, nf is a vector of
    normalization factors from edgeR, and rs is the vector with the column
    sums of the count matrix, I would expect that

    plot( sf, rs * nm )

    gives a plot with the points lying roughly on a straight line.

    Simon

    _______________________________________________
    Bioconductor mailing list
    Bioconductor at r-project.org
    https://stat.ethz.ch/mailman/listinfo/bioconductor
    Search the archives:
    http://news.gmane.org/gmane.science.biology.informatics.conductor

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbioconductor @
categoriesr
postedMar 15, '13 at 3:43p
activeMar 15, '13 at 10:43p
posts3
users3
websitebioconductor.org
irc#r

People

Translate

site design / logo © 2022 Grokbase