FAQ
Hi,

Has anyone ever had a case where a SNP was not found in
SNPlocs.Hsapiens.dbSNP.
20120608, but is found in dbSNP 137? I am having this problem with SNP
rs7775397.
library(SNPlocs.Hsapiens.dbSNP.20120608)
rsidsToGRanges('rs7775397')
Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397

Thanks,
Christina
sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] datasets utils grDevices graphics stats methods base

other attached packages:
[1] SNPlocs.Hsapiens.dbSNP.
20120608_0.99.8
[2] BSgenome_1.26.1
[3] Biostrings_2.26.2
[4] GenomicRanges_1.10.5
[5] IRanges_1.16.4
[6] BiocGenerics_0.4.0

loaded via a namespace (and not attached):
[1] parallel_2.15.2 stats4_2.15.2

--
Christina Chaivorapol, Ph.D.
Genentech, Inc.
Bioinformatics & Computational Biology

Search Discussions

  • Hervé Pagès at Jan 16, 2013 at 7:00 am
    Hi Christina,


    According to the official announcement:




    http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2012q2/000122.html


    there are 53,558,214 rs ids in dbSNP 137 for Human.


    But in SNPlocs.Hsapiens.dbSNP.20120608:

    library(SNPlocs.Hsapiens.dbSNP.20120608)
    sum(getSNPcount())
    [1] 45416711


    As explained in ?SNPlocs.Hsapiens.dbSNP.20120608, the package (like
    all other SNPlocs packages) was curated:


    SNPs from dbSNP were filtered to keep only those satisfying the 3
    following criteria:


    ? The SNP is a single-base substitution i.e. its type is "snp".
    Other types used by dbSNP are: "in-del", "mixed",
    "microsatellite", "named-locus",
    "multinucleotide-polymorphism", etc... All those SNPs were
    dropped.


    ? The SNP is marked as notwithdrawn.


    ? A *single* location on the reference genome (GRCh37.p5) is
    reported for the SNP, and this location is on chromosomes
    1-22, X, Y, MT.


    In the case of rs7775397, it was dropped because of this last reason.
    More precisely, the record in ds_flat_ch6.flat for this SNP contains
    the following CTG lines:


    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos2261252 | NT_007592.15 |
    ctg-start2201252 | ctg-end2201252 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 |
    ctg-start732030 | ctg-end732030 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 |
    ctg-start540499 | ctg-end540499 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 |
    ctg-start604088 | ctg-end604088 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 |
    ctg-start522471 | ctg-end522471 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 |
    ctg-start609047 | ctg-end609047 | loctype=2 | orient=+


    That is, more than 1 CTG line corresponding to the reference assembly
    (GRCh37.p5). This is the reason why the SNP was dropped.


    I realize now that maybe I could keep those SNPs that have more than
    1 CTG line corresponding to the reference assembly as long as exactly
    1 of them actually provides a value for the chr-pos field. Would that
    be reasonable?


    Thanks,
    H.



    On 01/15/2013 05:19 PM, Christina Chaivorapol wrote:
    Hi,

    Has anyone ever had a case where a SNP was not found in
    SNPlocs.Hsapiens.dbSNP.
    20120608, but is found in dbSNP 137? I am having this problem with SNP
    rs7775397.
    library(SNPlocs.Hsapiens.dbSNP.20120608)
    rsidsToGRanges('rs7775397')
    Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397

    Thanks,
    Christina
    sessionInfo()
    R version 2.15.2 (2012-10-26)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
    [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
    [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=C LC_NAME=C
    [9] LC_ADDRESS=C LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] datasets utils grDevices graphics stats methods base

    other attached packages:
    [1] SNPlocs.Hsapiens.dbSNP.
    20120608_0.99.8
    [2] BSgenome_1.26.1
    [3] Biostrings_2.26.2
    [4] GenomicRanges_1.10.5
    [5] IRanges_1.16.4
    [6] BiocGenerics_0.4.0

    loaded via a namespace (and not attached):
    [1] parallel_2.15.2 stats4_2.15.2

    --
    Herv? Pag?s


    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024


    E-mail: hpages at fhcrc.org
    Phone: (206) 667-5791
    Fax: (206) 667-1319
  • Christina Chaivorapol at Jan 16, 2013 at 6:15 pm
    Thanks for your help Tim and Herve.

    It would be very useful to include the SNPs that have a value for the
    chr-pos field even if they have more than 1 CTG line for my purposes since
    I deal with a lot of immune-related genes that tend to be difficult to
    map. Would it be possible to include these types of SNPs, but flag them as
    having more than 1 CTG line?

    Thanks for your help,
    Christina

    On Tue, Jan 15, 2013 at 11:00 PM, Hervé Pagès wrote:

    Hi Christina,

    According to the official announcement:


    http://www.ncbi.nlm.nih.gov/**mailman/pipermail/dbsnp-**
    announce/2012q2/000122.html<http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2012q2/000122.html>

    there are 53,558,214 rs ids in dbSNP 137 for Human.

    But in SNPlocs.Hsapiens.dbSNP.**20120608:
    library(SNPlocs.Hsapiens.**dbSNP.20120608)
    sum(getSNPcount())
    [1] 45416711

    As explained in ?SNPlocs.Hsapiens.dbSNP.**20120608, the package (like
    all other SNPlocs packages) was curated:

    SNPs from dbSNP were filtered to keep only those satisfying the 3
    following criteria:

    • The SNP is a single-base substitution i.e. its type is "snp".
    Other types used by dbSNP are: "in-del", "mixed",
    "microsatellite", "named-locus",
    "multinucleotide-polymorphism"**, etc... All those SNPs were
    dropped.

    • The SNP is marked as notwithdrawn.

    • A *single* location on the reference genome (GRCh37.p5) is
    reported for the SNP, and this location is on chromosomes
    1-22, X, Y, MT.

    In the case of rs7775397, it was dropped because of this last reason.
    More precisely, the record in ds_flat_ch6.flat for this SNP contains
    the following CTG lines:

    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=32261252 | NT_007592.15 |
    ctg-start=32201252 | ctg-end=32201252 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 |
    ctg-start=3732030 | ctg-end=3732030 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 |
    ctg-start=3540499 | ctg-end=3540499 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 |
    ctg-start=3604088 | ctg-end=3604088 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 |
    ctg-start=3522471 | ctg-end=3522471 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 |
    ctg-start=3609047 | ctg-end=3609047 | loctype=2 | orient=+

    That is, more than 1 CTG line corresponding to the reference assembly
    (GRCh37.p5). This is the reason why the SNP was dropped.

    I realize now that maybe I could keep those SNPs that have more than
    1 CTG line corresponding to the reference assembly as long as exactly
    1 of them actually provides a value for the chr-pos field. Would that
    be reasonable?

    Thanks,
    H.


    On 01/15/2013 05:19 PM, Christina Chaivorapol wrote:

    Hi,

    Has anyone ever had a case where a SNP was not found in
    SNPlocs.Hsapiens.dbSNP.
    20120608, but is found in dbSNP 137? I am having this problem with SNP
    rs7775397.

    library(SNPlocs.Hsapiens.**dbSNP.20120608)
    rsidsToGRanges('rs7775397')
    Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397

    Thanks,
    Christina

    sessionInfo()
    R version 2.15.2 (2012-10-26)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
    [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
    [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=C LC_NAME=C
    [9] LC_ADDRESS=C LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] datasets utils grDevices graphics stats methods base

    other attached packages:
    [1] SNPlocs.Hsapiens.dbSNP.
    20120608_0.99.8
    [2] BSgenome_1.26.1
    [3] Biostrings_2.26.2
    [4] GenomicRanges_1.10.5
    [5] IRanges_1.16.4
    [6] BiocGenerics_0.4.0

    loaded via a namespace (and not attached):
    [1] parallel_2.15.2 stats4_2.15.2
    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpages@fhcrc.org
    Phone: (206) 667-5791
    Fax: (206) 667-1319


    --
    Christina Chaivorapol, Ph.D.
    Genentech, Inc.
    Bioinformatics & Computational Biology
    phone: 650-225-6903
    chrichai@gene.com
  • Hervé Pagès at Jan 17, 2013 at 8:53 pm
    Hi Christina,

    On 01/16/2013 10:15 AM, Christina Chaivorapol wrote:
    Thanks for your help Tim and Herve.

    It would be very useful to include the SNPs that have a value for the
    chr-pos field even if they have more than 1 CTG line for my purposes
    since I deal with a lot of immune-related genes that tend to be
    difficult to map. Would it be possible to include these types of SNPs,
    but flag them as having more than 1 CTG line?

    So I've included them in version 0.99.9 of SNPlocs.Hsapiens.dbSNP.20120608.
    They're not flagged though. Note that there still is a *single*
    location on the reference genome that is reported for those SNPs,,
    because the other "locations" are reported as ? (question mark)
    and it seems fair to not consider ? as a location.


    With this new version of the package:

    library(SNPlocs.Hsapiens.dbSNP.20120608)
    sum(getSNPcount())
    [1] 45697775


    that is, 281064 more SNPs (i.e. 0.6%) compared to the previous version
    (i.e. 0.99.8). rs7775397 is one of them now:

    rsidsToGRanges("rs7775397")
    GRanges with 1 range and 2 metadata columns:
    seqnames ranges strand | RefSNP_id alleles_as_ambig
    <Rle> <IRanges> <Rle> | <character> <character>
    [1] ch6 [32261252, 32261252] + | 7775397 K
    ---
    seqlengths:
    ch1 ch2 ch3 ch4 ... chX chY
    chMT
    249250621 243199373 198022430 191154276 ... 155270560 59373566
    16569


    SNPlocs.Hsapiens.dbSNP.20120608 version 0.99.9 will be available in
    Bioc devel (requires devel version of R i.e. R 3.0) thru biocLite() in
    about 45 min. Only the source package for now, which you should be
    able to install on Windows or Mac with biocLite( , type="source").


    Let me know if you have questions about this.


    Cheers,
    H.

    Thanks for your help,
    Christina


    On Tue, Jan 15, 2013 at 11:00 PM, Herv? Pag?s <hpages at fhcrc.org
    wrote:

    Hi Christina,

    According to the official announcement:


    http://www.ncbi.nlm.nih.gov/__mailman/pipermail/dbsnp-__announce/2012q2/000122.html
    <http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2012q2/000122.html>

    there are 53,558,214 rs ids in dbSNP 137 for Human.

    But in SNPlocs.Hsapiens.dbSNP.__20120608:
    library(SNPlocs.Hsapiens.__dbSNP.20120608)
    sum(getSNPcount())
    [1] 45416711

    As explained in ?SNPlocs.Hsapiens.dbSNP.__20120608, the package (like
    all other SNPlocs packages) was curated:

    SNPs from dbSNP were filtered to keep only those satisfying the 3
    following criteria:

    ? The SNP is a single-base substitution i.e. its type is "snp".
    Other types used by dbSNP are: "in-del", "mixed",
    "microsatellite", "named-locus",
    "multinucleotide-polymorphism"__, etc... All those SNPs were
    dropped.

    ? The SNP is marked as notwithdrawn.

    ? A *single* location on the reference genome (GRCh37.p5) is
    reported for the SNP, and this location is on chromosomes
    1-22, X, Y, MT.

    In the case of rs7775397, it was dropped because of this last reason.
    More precisely, the record in ds_flat_ch6.flat for this SNP contains
    the following CTG lines:

    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos2261252 | NT_007592.15 |
    ctg-start2201252 | ctg-end2201252 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_113891.2 |
    ctg-start732030 | ctg-end732030 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167245.1 |
    ctg-start540499 | ctg-end540499 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167246.1 |
    ctg-start604088 | ctg-end604088 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167248.1 |
    ctg-start522471 | ctg-end522471 | loctype=2 | orient=+
    CTG | assembly=GRCh37.p5 | chr=6 | chr-pos=? | NT_167249.1 |
    ctg-start609047 | ctg-end609047 | loctype=2 | orient=+

    That is, more than 1 CTG line corresponding to the reference assembly
    (GRCh37.p5). This is the reason why the SNP was dropped.

    I realize now that maybe I could keep those SNPs that have more than
    1 CTG line corresponding to the reference assembly as long as exactly
    1 of them actually provides a value for the chr-pos field. Would that
    be reasonable?

    Thanks,
    H.



    On 01/15/2013 05:19 PM, Christina Chaivorapol wrote:

    Hi,

    Has anyone ever had a case where a SNP was not found in
    SNPlocs.Hsapiens.dbSNP.
    20120608, but is found in dbSNP 137? I am having this problem
    with SNP
    rs7775397.

    library(SNPlocs.Hsapiens.__dbSNP.20120608)
    rsidsToGRanges('rs7775397')

    Error in .snpid2rowidx(x, snpid) : SNP id(s) not found: 7775397

    Thanks,
    Christina

    sessionInfo()

    R version 2.15.2 (2012-10-26)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
    [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
    [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
    [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
    [7] LC_PAPER=C LC_NAME=C
    [9] LC_ADDRESS=C LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] datasets utils grDevices graphics stats methods base

    other attached packages:
    [1] SNPlocs.Hsapiens.dbSNP.
    20120608_0.99.8
    [2] BSgenome_1.26.1
    [3] Biostrings_2.26.2
    [4] GenomicRanges_1.10.5
    [5] IRanges_1.16.4
    [6] BiocGenerics_0.4.0

    loaded via a namespace (and not attached):
    [1] parallel_2.15.2 stats4_2.15.2


    --
    Herv? Pag?s

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpages at fhcrc.org <mailto:hpages@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>




    --
    Christina Chaivorapol, Ph.D.
    Genentech, Inc.
    Bioinformatics & Computational Biology
    phone: 650-225-6903
    chrichai at gene.com <mailto:chrichai@gene.com>

    --
    Herv? Pag?s


    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024


    E-mail: hpages at fhcrc.org
    Phone: (206) 667-5791
    Fax: (206) 667-1319

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupbioconductor @
categoriesr
postedJan 16, '13 at 1:19a
activeJan 17, '13 at 8:53p
posts4
users2
websitebioconductor.org
irc#r

People

Translate

site design / logo © 2019 Grokbase