FAQ
Dear All,

I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each.
I was exploring the performance of both EdgeR and DeSeq and I noticed they
vary a lot on the dispersion of the normalization factors.
Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2
while if I use DeSeq estimateSizeFactors the distribution varies from
0.4-1.7. Given that these are exactly the same libraries
why do the estimates vary so much? How will that impact the list of DEgenes?
I know that the calculations are not performed in the same way, but aren't
those two functions aimed at estimating the same phenomenon?

--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania

"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi

## Search Discussions

•  at Mar 15, 2013 at 6:58 pm ⇧
Hi Lucia

On 15/03/13 16:43, Lucia Peixoto wrote:
I am currently analyzing an RNASeq dataset, I have 3 samples with n=4 each.
I was exploring the performance of both EdgeR and DeSeq and I noticed they
vary a lot on the dispersion of the normalization factors.
Using EdgeR calcNormFactors I get a distribution that varies from 0.9-1.2
while if I use DeSeq estimateSizeFactors the distribution varies from
0.4-1.7. Given that these are exactly the same libraries
why do the estimates vary so much? How will that impact the list of DEgenes?
I know that the calculations are not performed in the same way, but aren't
those two functions aimed at estimating the same phenomenon?

EdgeR's library factors are relative to the total read count, and
DESeq's aren't. Do, if you want to compare them, you have to multiply
the factors from edgeR with the total read counts and divide by some
suitable big number.

So, if sf is vector of size factors from DESeq, nf is a vector of
normalization factors from edgeR, and rs is the vector with the column
sums of the count matrix, I would expect that

plot( sf, rs * nm )

gives a plot with the points lying roughly on a straight line.

Simon
•  at Mar 15, 2013 at 10:43 pm ⇧
If you use the "getOffset" function for your DGEList object and the
following function for your CountDataSet object, you will get offset
values that are directly comparable:

library(DESeq)
library(edgeR)
library(ggplot2)
getOffset.CountDataSet <- function(y) {
if (any(is.na(sizeFactors(y))))
stop("Call estimateSizeFactors first")
log(sizeFactors(y)) - mean(log(sizeFactors(y))) +
mean(log(colSums(counts(y))))
}
cds <- makeExampleCountDataSet()
cds <- estimateSizeFactors(cds)
dge <- DGEList(counts=counts(cds), group=pData(cds)\$condition)
dge <- calcNormFactors(dge)

qplot(x=getOffset(dge), y=getOffset.CountDataSet(cds)) +
labs(title="Offsets, DESeq vs edgeR",
x="edgeR offset", y="DESeq offset") +
coord_equal() +
geom_abline(slope=1, intercept=0)

On Fri 15 Mar 2013 11:58:11 AM PDT, Simon Anders wrote:
Hi Lucia
On 15/03/13 16:43, Lucia Peixoto wrote:
I am currently analyzing an RNASeq dataset, I have 3 samples with n=4
each.
I was exploring the performance of both EdgeR and DeSeq and I noticed
they
vary a lot on the dispersion of the normalization factors.
Using EdgeR calcNormFactors I get a distribution that varies from
0.9-1.2
while if I use DeSeq estimateSizeFactors the distribution varies from
0.4-1.7. Given that these are exactly the same libraries
why do the estimates vary so much? How will that impact the list of
DEgenes?
I know that the calculations are not performed in the same way, but
aren't
those two functions aimed at estimating the same phenomenon?
EdgeR's library factors are relative to the total read count, and
DESeq's aren't. Do, if you want to compare them, you have to multiply
the factors from edgeR with the total read counts and divide by some
suitable big number.

So, if sf is vector of size factors from DESeq, nf is a vector of
normalization factors from edgeR, and rs is the vector with the column
sums of the count matrix, I would expect that

plot( sf, rs * nm )

gives a plot with the points lying roughly on a straight line.

Simon

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

## Related Discussions

Discussion Overview
 group bioconductor categories r posted Mar 15, '13 at 3:43p active Mar 15, '13 at 10:43p posts 3 users 3 website bioconductor.org irc #r

### 3 users in discussion

Content

People

Support

Translate

site design / logo © 2023 Grokbase