Grokbase Groups R r-devel August 2011
FAQ
fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")
Program received signal SIGSEGV, Segmentation fault.
0x000000000041c2e1 in RunGenCollect (size_needed�92000) at memory.c:1514
1514 PROCESS_NODES();
(gdb)
sessionInfo()
R version 2.13.1 Patched (2011-08-25 r56798)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base
>

The text file 'fil2_s.txt' is Huge, around 11 million records and 17
variables, but ...?



--
G?ran Brostr?m

Search Discussions

  • Scott at Aug 26, 2011 at 7:41 pm
    It does look like you've got a memory issue. perhaps using
    as.is=TRUE, and/or stringsAsFactorsúLSE will help as optional arguments
    to read.table

    if you don't specify these sorts of things, R can have to look through the
    file and figure out which columns are characters/factors etc and so the
    larger files cause more of a headache for R I'm guess. Hopefully someone
    else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
    stringsAsFactors.

    do you have other objects loaded in memory as well? this file by itself
    might not be the problem - but it's a cumulative issue.
    have you checked the file structure in any other manner?
    how large (Mb/kb) is the file that you're trying to read?
    if you just read in parts of the file, is it okay?
    read.table(filename,headerúLSE,sep="\t",nrows0)
    read.table(filename,headerúLSE,sep="\t",skip 000,nrows0)



    --
    View this message in context: http://r.789695.n4.nabble.com/read-table-segfaults-tp3771793p3771817.html
    Sent from the R devel mailing list archive at Nabble.com.
  • Ben Bolker at Aug 26, 2011 at 9:55 pm
    Scott <ncbi2r <at> googlemail.com> writes:
    It does look like you've got a memory issue. perhaps using
    as.is=TRUE, and/or stringsAsFactorsúLSE will help as optional arguments
    to read.table

    if you don't specify these sorts of things, R can have to look through the
    file and figure out which columns are characters/factors etc and so the
    larger files cause more of a headache for R I'm guess. Hopefully someone
    else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
    stringsAsFactors.

    do you have other objects loaded in memory as well? this file by itself
    might not be the problem - but it's a cumulative issue.
    have you checked the file structure in any other manner?
    how large (Mb/kb) is the file that you're trying to read?
    if you just read in parts of the file, is it okay?
    read.table(filename,headerúLSE,sep="\t",nrows0)
    read.table(filename,headerúLSE,sep="\t",skip 000,nrows0)
    There seem to be two issues here:

    1. what can the original poster (OP) do to work around this problem?
    (e.g. get the data into a relational data base and import it from
    there; use something from the High Performance task view such as
    ff or data.table ...)

    2. reporting a bug -- according to the R FAQ, any low-level
    (segmentation-fault-type) crash of R when one is not messing
    around with dynamically loaded code constitutes a bug. Unfortunately,
    debugging problems like this is a huge pain in the butt.

    Goran, can you randomly or systematically generate an
    object of this size, write it to disk, read it back in, and
    generate the same error? In other words, does something like

    set.seed(1001)
    d <- data.frame(label=rep(LETTERS[1:11],1e6),
    values=matrix(rep(1.0,11*17*1e6),ncol)
    write.table(d,file="big.txt")
    read.table("big.txt")

    do the same thing?

    Reducing it to this kind of reproducible example will make
    it possible for others to debug it without needing to gain
    access to your huge file ...
  • Göran Broström at Aug 27, 2011 at 9:27 am

    On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker wrote:
    Scott <ncbi2r <at> googlemail.com> writes:
    It does look like you've got a memory issue. perhaps using
    ? as.is=TRUE, and/or stringsAsFactorsúLSE will help as optional arguments
    to read.table

    if you don't specify these sorts of things, R can have to look through the
    file and figure out which columns are characters/factors etc and so the
    larger files cause more of a headache for R I'm guess. Hopefully someone
    else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
    stringsAsFactors.

    ? ?do you have other objects loaded in memory as well? this file by itself
    might not be the problem - but it's a cumulative issue.
    ? ?have you checked the file structure in any other manner?
    ? ?how large (Mb/kb) is the file that you're trying to read?
    ? ?if you just read in parts of the file, is it okay?
    ? ? ? read.table(filename,headerúLSE,sep="\t",nrows0)
    ? ? ? read.table(filename,headerúLSE,sep="\t",skip 000,nrows0)
    ?There seem to be two issues here:

    1. what can the original poster (OP) do to work around this problem?
    (e.g. get the data into a relational data base and import it from
    there; use something from the High Performance task view such as
    ff or data.table ...)
    Interestingly, the text file was created by a selection from an SQL
    data base. I have access to 'db2' on an ubuntu machine, I run, at the
    bash prompt,

    $ db2 < file2.sql

    where file2.sql contains

    connect to linnedb user goran using xxxxxxxxxxx
    export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09
    select linneid, fodelsear, kon, ....... from u09021.fil2
    connect reset

    How do I get a direct connection between R and the data base 'linnedb'?
    2. reporting a bug -- according to the R FAQ, any low-level
    (segmentation-fault-type) crash of R when one is not messing
    around with dynamically loaded code constitutes a bug. Unfortunately,
    debugging problems like this is a huge pain in the butt.

    ?Goran, can you randomly or systematically generate an
    object of this size, write it to disk, read it back in, and
    generate the same error? ?In other words, does something like

    set.seed(1001)
    d <- data.frame(label=rep(LETTERS[1:11],1e6),
    ? ? ? ? ? ? ? ?values=matrix(rep(1.0,11*17*1e6),ncol)
    write.table(d,file="big.txt")
    read.table("big.txt")

    do the same thing?
    No but I get new errors:
    ss <- read.table("big.txt")
    Error in read.table("big.txt") : duplicate 'row.names' are not allowed

    (there are no duplicates)

    I tried to add an item to the first line and
    ss <- read.table("big.txt", header = TRUE)
    Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
    line 10610008 did not have 19 elements

    which is wrong; that line has 19 elements.

    G?ran
    Reducing it to this kind of reproducible example will make
    it possible for others to debug it without needing to gain
    access to your huge file ...

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel


    --
    G?ran Brostr?m
  • Göran Broström at Aug 27, 2011 at 9:49 am

    On Fri, Aug 26, 2011 at 9:41 PM, Scott wrote:
    It does look like you've got a memory issue. perhaps using
    ?as.is=TRUE, and/or stringsAsFactorsúLSE will help as optional arguments
    to read.table

    if you don't specify these sorts of things, R can have to look through the
    file and figure out which columns are characters/factors etc and so the
    larger files cause more of a headache for R I'm guess. Hopefully someone
    else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
    stringsAsFactors.

    ? do you have other objects loaded in memory as well? this file by itself
    might not be the problem - but it's a cumulative issue.
    ? have you checked the file structure in any other manner?
    ? how large (Mb/kb) is the file that you're trying to read?
    ? if you just read in parts of the file, is it okay?
    ? ? ?read.table(filename,headerúLSE,sep="\t",nrows0)
    ? ? ?read.table(filename,headerúLSE,sep="\t",skip 000,nrows0)
    Today, after a night's sleep, there are no segfaults! (The computer
    also slept, I turned it off.) So what is going on? Maybe I shouldn't
    bother.... but I installed the latest patched version yesterday,
    immediately tried to read the file with a segfault as a result, turned
    the machine off and on, and no problems. Do we need to reboot after a
    new install (note, this is not Windows)?

    G?ran


    --
    View this message in context: http://r.789695.n4.nabble.com/read-table-segfaults-tp3771793p3771817.html
    Sent from the R devel mailing list archive at Nabble.com.

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel


    --
    G?ran Brostr?m
  • Göran Broström at Aug 26, 2011 at 8:10 pm
    Another one:

    The 'death.RData' was created about a year ago, but ...? Same info as below.

    G?ran
    load("../Data/death.RData")
    summary(death)
    *** caught segfault ***
    address 0x40000e04959, cause 'memory not mapped'

    Traceback:
    1: match(x, levels)
    2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA = "no") NA)
    3: table(object)
    4: summary.factor(X[[6L]], ...)
    5: FUN(X[[6L]], ...)
    6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, ...)
    7: summary.data.frame(death)
    8: summary(death)

    Possible actions:
    1: abort (with core dump, if enabled)
    2: normal R exit
    3: exit R without saving workspace
    4: exit R saving workspace
    Selection:


    2011/8/26 G?ran Brostr?m <goran.brostrom@gmail.com>:
    ?> fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")

    Program received signal SIGSEGV, Segmentation fault.
    0x000000000041c2e1 in RunGenCollect (size_needed�92000) at memory.c:1514
    1514 ? ? ? ?PROCESS_NODES();
    (gdb)

    ?> sessionInfo()
    R version 2.13.1 Patched (2011-08-25 r56798)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
    ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
    ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
    ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
    ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
    ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
    ?>

    The text file 'fil2_s.txt' is Huge, around 11 million records and 17
    variables, but ...?



    --
    G?ran Brostr?m


    --
    G?ran Brostr?m
  • Göran Broström at Aug 26, 2011 at 8:16 pm
    One further note:

    No problem with R version 2.13.0 (2011-04-13)

    G?ran

    2011/8/26 G?ran Brostr?m <goran.brostrom@gmail.com>:
    Another one:

    The 'death.RData' was created about a year ago, but ...? Same info as below.

    G?ran
    load("../Data/death.RData")
    summary(death)
    ?*** caught segfault ***
    address 0x40000e04959, cause 'memory not mapped'

    Traceback:
    ?1: match(x, levels)
    ?2: factor(a, levels = ll[!(ll %in% exclude)], exclude = if (useNA => ? "no") NA)
    ?3: table(object)
    ?4: summary.factor(X[[6L]], ...)
    ?5: FUN(X[[6L]], ...)
    ?6: lapply(as.list(object), summary, maxsum = maxsum, digits = 12, ? ? ...)
    ?7: summary.data.frame(death)
    ?8: summary(death)

    Possible actions:
    1: abort (with core dump, if enabled)
    2: normal R exit
    3: exit R without saving workspace
    4: exit R saving workspace
    Selection:


    2011/8/26 G?ran Brostr?m <goran.brostrom@gmail.com>:
    ?> fil2s <- read.table("../Data/fil2_s.txt", header = FALSE, sep = "\t")

    Program received signal SIGSEGV, Segmentation fault.
    0x000000000041c2e1 in RunGenCollect (size_needed�92000) at memory.c:1514
    1514 ? ? ? ?PROCESS_NODES();
    (gdb)

    ?> sessionInfo()
    R version 2.13.1 Patched (2011-08-25 r56798)
    Platform: x86_64-unknown-linux-gnu (64-bit)

    locale:
    ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
    ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
    ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
    ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
    ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
    [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

    attached base packages:
    [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
    ?>

    The text file 'fil2_s.txt' is Huge, around 11 million records and 17
    variables, but ...?



    --
    G?ran Brostr?m


    --
    G?ran Brostr?m


    --
    G?ran Brostr?m

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-devel @
categoriesr
postedAug 26, '11 at 7:28p
activeAug 27, '11 at 9:49a
posts7
users3
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase