FAQ
Dear List,

I have a dataset with 3 conditions (control and 2 treatments), 2 replicates each. Initially I carried out a normal analysis in DESeq - each treatment vs the control, but later was told it needed to be a paired analysis.

Thus, just to be consistent I decided to use DESeq itself. However, I am unsure how to extract my comparisons of interest.
------
Code:
design
   condition pair
1 Cont 1
2 Cont 3
3 Trt1 1
4 Trt1 3
5 Trt2 1
6 Trt2 3
cds2 <- newCountDataSet(gct, design)
cds2 <- estimateSizeFactors(cds2)
sizeFactors(cds2)
      Cont_1 Cont_3 Trt1_1 Trt1_3 Trt2_1 Trt2_3
0.9373964 1.1328686 1.0097990 1.0596419 0.8562104 1.0645546

.cds2 <- estimateDispersions(cds2,"pooled-CR",modelFormula=count ~ pair + condition)
fit1 = fitNbinomGLMs(cds2, count ~ pair + condition)
fit0 = fitNbinomGLMs(cds2, count ~ pair)
str(fit1)
'data.frame': 4765 obs. of 6 variables:
$ (Intercept): num 10.19 9.99 6.89 6.48 4.46 ...
$ pair3 : num -0.3018 -0.3222 0.068 0.0468 0.1504 ...
$ conditionTrt1: num 0.17 0.142 -0.495 0.125 0.319 ...
$ conditionTrt2: num 0.1882 0.00112 -0.2704 -0.09412 0.10651 ...
$ deviance : num 0.0742 0.1569 0.9072 2.0612 0.3702 ...
$ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
- attr(*, "df.residual")= num 2
head(fit1)
          (Intercept) pair3 conditionTrt1 conditionTrt2 deviance converged
gene0 10.188336 -0.30184662 0.1698471 0.188197344 0.0741656 TRUE
gene1 9.992372 -0.32221027 0.1415406 0.001119304 0.1568869 TRUE
gene10 6.893693 0.06795915 -0.4954365 -0.270402107 0.9072335 TRUE
gene100 6.477847 0.04684995 0.1250743 -0.094116036 2.0611685 TRUE
gene1002 4.463446 0.15038277 0.3186187 0.106512332 0.3701619 TRUE
gene1003 4.090079 -0.05247162 -0.3560189 -0.054765642 1.9631768 TRUE
pvalsGLM = nbinomGLMTest( fit1, fit0 )
padjGLM = p.adjust(pvalsGLM, method="BH" )
---------

Based on the above how do I extract:

1) Trt1 vs Cont? - is this the 3rd column of fit1, depicted as log2 FoldChange?



2) Trt2 vs Cont? - is this the 4th column of fit1, depicted as log2 FoldChange?



3) What would the p-values for each comparison be? As pvalsGLM and padjGLM gives only one set, and I don't think it will be the same for both comparisons.


Many Thanks,
Natasha

Search Discussions

  • Natasha Sahgal at May 14, 2013 at 11:30 am
    Sorry forgot the sessionInfo! Added at the end now

    --


    From: Natasha Sahgal
    Sent: 14 May 2013 12:29
    To: bioconductor@r-project.org
    Subject: DESeq multifactor design paired analysis

    Dear List,

    I have a dataset with 3 conditions (control and 2 treatments), 2 replicates each. Initially I carried out a normal analysis in DESeq - each treatment vs the control, but later was told it needed to be a paired analysis.

    Thus, just to be consistent I decided to use DESeq itself. However, I am unsure how to extract my comparisons of interest.
    ------
    Code:
    design
       condition pair
    1 Cont 1
    2 Cont 3
    3 Trt1 1
    4 Trt1 3
    5 Trt2 1
    6 Trt2 3
    cds2 <- newCountDataSet(gct, design)
    cds2 <- estimateSizeFactors(cds2)
    sizeFactors(cds2)
          Cont_1 Cont_3 Trt1_1 Trt1_3 Trt2_1 Trt2_3
    0.9373964 1.1328686 1.0097990 1.0596419 0.8562104 1.0645546

    .cds2 <- estimateDispersions(cds2,"pooled-CR",modelFormula=count ~ pair + condition)
    fit1 = fitNbinomGLMs(cds2, count ~ pair + condition)
    fit0 = fitNbinomGLMs(cds2, count ~ pair)
    str(fit1)
    'data.frame': 4765 obs. of 6 variables:
    $ (Intercept): num 10.19 9.99 6.89 6.48 4.46 ...
    $ pair3 : num -0.3018 -0.3222 0.068 0.0468 0.1504 ...
    $ conditionTrt1: num 0.17 0.142 -0.495 0.125 0.319 ...
    $ conditionTrt2: num 0.1882 0.00112 -0.2704 -0.09412 0.10651 ...
    $ deviance : num 0.0742 0.1569 0.9072 2.0612 0.3702 ...
    $ converged : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
    - attr(*, "df.residual")= num 2
    head(fit1)
              (Intercept) pair3 conditionTrt1 conditionTrt2 deviance converged
    gene0 10.188336 -0.30184662 0.1698471 0.188197344 0.0741656 TRUE
    gene1 9.992372 -0.32221027 0.1415406 0.001119304 0.1568869 TRUE
    gene10 6.893693 0.06795915 -0.4954365 -0.270402107 0.9072335 TRUE
    gene100 6.477847 0.04684995 0.1250743 -0.094116036 2.0611685 TRUE
    gene1002 4.463446 0.15038277 0.3186187 0.106512332 0.3701619 TRUE
    gene1003 4.090079 -0.05247162 -0.3560189 -0.054765642 1.9631768 TRUE
    pvalsGLM = nbinomGLMTest( fit1, fit0 )
    padjGLM = p.adjust(pvalsGLM, method="BH" )
    ---------

    Based on the above how do I extract:

    1) Trt1 vs Cont? - is this the 3rd column of fit1, depicted as log2 FoldChange?



    2) Trt2 vs Cont? - is this the 4th column of fit1, depicted as log2 FoldChange?



    3) What would the p-values for each comparison be? As pvalsGLM and padjGLM gives only one set, and I don't think it will be the same for both comparisons.


    sessionInfo()
    R version 2.15.2 (2012-10-26)
    Platform: x86_64-pc-linux-gnu (64-bit)

    locale:
    [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
      [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
      [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
      [7] LC_PAPER=C LC_NAME=C
      [9] LC_ADDRESS=C LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] gdata_2.12.0 WriteXLS_2.3.0 DESeq_1.10.1 lattice_0.20-6
    [5] locfit_1.5-8 Biobase_2.18.0 BiocGenerics_0.4.0

    loaded via a namespace (and not attached):
    [1] annotate_1.36.0 AnnotationDbi_1.20.2 DBI_0.2-5
      [4] genefilter_1.40.0 geneplotter_1.36.0 grid_2.15.2
      [7] gtools_2.7.0 IRanges_1.16.4 MASS_7.3-22
    [10] parallel_2.15.2 RColorBrewer_1.0-5 RSQLite_0.11.2
    [13] splines_2.15.2 stats4_2.15.2 survival_2.36-14
    [16] tcltk_2.15.2 tools_2.15.2 XML_3.95-0.1
    [19] xtable_1.7-0

    Many Thanks,
    Natasha
  • Michael Love at May 14, 2013 at 1:38 pm
    Hi Natasha,
    On Tue, May 14, 2013 at 1:30 PM, Natasha Sahgal wrote:

    head(fit1)
    (Intercept) pair3 conditionTrt1 conditionTrt2 deviance
    converged
    gene0 10.188336 -0.30184662 0.1698471 0.188197344 0.0741656
    TRUE
    gene1 9.992372 -0.32221027 0.1415406 0.001119304 0.1568869
    TRUE
    gene10 6.893693 0.06795915 -0.4954365 -0.270402107 0.9072335
    TRUE
    gene100 6.477847 0.04684995 0.1250743 -0.094116036 2.0611685
    TRUE
    gene1002 4.463446 0.15038277 0.3186187 0.106512332 0.3701619
    TRUE
    gene1003 4.090079 -0.05247162 -0.3560189 -0.054765642 1.9631768
    TRUE
    pvalsGLM = nbinomGLMTest( fit1, fit0 )
    padjGLM = p.adjust(pvalsGLM, method="BH" )
    ---------

    Based on the above how do I extract:

    1) Trt1 vs Cont? - is this the 3rd column of fit1, depicted as log2
    FoldChange?



    2) Trt2 vs Cont? - is this the 4th column of fit1, depicted as log2
    FoldChange?
    Yes, you are right. The 3rd and 4th columns of fit1 are the log2 fold
    changes for trt1 vs control and trt2 vs control.


    3) What would the p-values for each comparison be? As pvalsGLM and
    padjGLM gives only one set, and I don't think it will be the same for both
    comparisons.

    The p-values and adjusted p-values here are from a likelihood ratio test of
    a model including the condition and pair variables vs a model which only
    includes the pair variable. However, we can provide tests of significance
    of each treatment from a model including both now in the DESeq2 package. We
    tried to simplify the interface, so it shouldn't take too much time to set
    up the same analysis. Note that the methods have been updated from DESeq,
    and these changes are described in the vignette in Appendix A.

    To use DESeq2, you would create a DESeqDataSet and extract the results
    (log2 fold changes, p-values and adjusted p-values) for each treatment,
    with something like:

    library(DESeq2)
    dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design
    = ~ pair + condition)
    dds <- DESeq(dds)
    resTrt1 <- results(dds, "condition_Trt1_vs_Cont")
    resTrt2 <- results(dds, "condition_Trt2_vs_Cont")

    Mike
  • Natasha Sahgal at May 14, 2013 at 4:59 pm
    Dear Michael,

    Thank you for your quick response, I shall try what you say after upgrading R and relevant packages.

    Many Thanks,
    Natasha

    --


    From: Michael Love
    Sent: 14 May 2013 14:39
    To: Natasha Sahgal
    Cc: bioconductor@r-project.org
    Subject: Re: [BioC] DESeq multifactor design paired analysis

    Hi Natasha,
    On Tue, May 14, 2013 at 1:30 PM, Natasha Sahgal wrote:

    head(fit1)
              (Intercept) pair3 conditionTrt1 conditionTrt2 deviance converged
    gene0 10.188336 -0.30184662 0.1698471 0.188197344 0.0741656 TRUE
    gene1 9.992372 -0.32221027 0.1415406 0.001119304 0.1568869 TRUE
    gene10 6.893693 0.06795915 -0.4954365 -0.270402107 0.9072335 TRUE
    gene100 6.477847 0.04684995 0.1250743 -0.094116036 2.0611685 TRUE
    gene1002 4.463446 0.15038277 0.3186187 0.106512332 0.3701619 TRUE
    gene1003 4.090079 -0.05247162 -0.3560189 -0.054765642 1.9631768 TRUE
    pvalsGLM = nbinomGLMTest( fit1, fit0 )
    padjGLM = p.adjust(pvalsGLM, method="BH" )
    ---------

    Based on the above how do I extract:

    1) Trt1 vs Cont? - is this the 3rd column of fit1, depicted as log2 FoldChange?



    2) Trt2 vs Cont? - is this the 4th column of fit1, depicted as log2 FoldChange?

    Yes, you are right. The 3rd and 4th columns of fit1 are the log2 fold changes for trt1 vs control and trt2 vs control.



    3) What would the p-values for each comparison be? As pvalsGLM and padjGLM gives only one set, and I don't think it will be the same for both comparisons.



    The p-values and adjusted p-values here are from a likelihood ratio test of a model including the condition and pair variables vs a model which only includes the pair variable. However, we can provide tests of significance of each treatment from a model including both now in the DESeq2 package. We tried to simplify the interface, so it shouldn't take too much time to set up the same analysis. Note that the methods have been updated from DESeq, and these changes are described in the vignette in Appendix A.

    To use DESeq2, you would create a DESeqDataSet and extract the results (log2 fold changes, p-values and adjusted p-values) for each treatment, with something like:

    library(DESeq2)
    dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ pair + condition)
    dds <- DESeq(dds)
    resTrt1 <- results(dds, "condition_Trt1_vs_Cont")
    resTrt2 <- results(dds, "condition_Trt2_vs_Cont")

    Mike

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbioconductor @
categoriesr
postedMay 14, '13 at 11:28a
activeMay 14, '13 at 4:59p
posts4
users2
websitebioconductor.org
irc#r

2 users in discussion

Natasha Sahgal: 3 posts Michael Love: 1 post

People

Translate

site design / logo © 2018 Grokbase