Grokbase Groups R r-help August 2012
FAQ
One set of data has censored (less-than detection limits) water chemistry
concentrations for 80-100% of all observations. My initial trial-and-error
attempts to apply the cenboxplot() method suggests that it has an upper
limit to the percentage of censored observations. I do not see this limit in
Dennis Helsel's second edition.

Has anyone experience plotting censored data and can provide me with the
maximum percentage of censored data in a set of observations?

TIA,

Rich

Search Discussions

  • David L Lorenz at Aug 16, 2012 at 12:29 pm
    Rich,
    The cenboxplot function uses cenros to estimate the censored values. The
    cenros function requires at least 2 uncensored observations to be able to
    do the regression. The cenros function does issue a warning when there are
    more than 80% censored data, but that is suppressed in cenboxplot.
    Hope this helps.
    Dave

    Date: Wed, 15 Aug 2012 14:28:54 -0700 (PDT)
    From: Rich Shepard <rshepard@appl-ecosys.com>
    To: r-help@r-project.org
    Subject: [R] NADA package/cenboxplot() method: maximum censored
    percentage
    Message-ID:
    <alpine.lnx.2.00.1208151413240.17434@salmo.appl-ecosys.com>
    Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII

    One set of data has censored (less-than detection limits) water
    chemistry
    concentrations for 80-100% of all observations. My initial trial-and-error
    attempts to apply the cenboxplot() method suggests that it has an upper
    limit to the percentage of censored observations. I do not see this limit
    in
    Dennis Helsel's second edition.

    Has anyone experience plotting censored data and can provide me with
    the
    maximum percentage of censored data in a set of observations?

    TIA,

    Rich
  • Rich Shepard at Aug 16, 2012 at 3:08 pm

    On Thu, 16 Aug 2012, David L Lorenz wrote:

    The cenboxplot function uses cenros to estimate the censored values. The
    cenros function requires at least 2 uncensored observations to be able to
    do the regression. The cenros function does issue a warning when there are
    more than 80% censored data, but that is suppressed in cenboxplot.
    Dave,

    I've seen the cenros warning when there are > 80% censored data, and now
    understand that I don't see the same warning with cenboxplot. Knowing now
    that I need at least 2 uncensored observations answers my question; some of
    the constituents have only a single uncensored value. Others have none, and
    I knew those could not be analyzed or plotted.
    Hope this helps.
    Most certainly does!

    Thanks very much,

    Rich
  • Rich Shepard at Aug 20, 2012 at 4:22 pm

    On Thu, 16 Aug 2012, David L Lorenz wrote:

    The cenboxplot function uses cenros to estimate the censored values. The
    cenros function requires at least 2 uncensored observations to be able to
    do the regression. The cenros function does issue a warning when there are
    more than 80% censored data, but that is suppressed in cenboxplot.
    There must be something other than > 2 uncensored observations in my data
    that prevent cenboxplot from functioning. For example, dissolved arsenic
    concentrations have 578 total observations. Of these, 180 (31.14%) are
    censored and 398 are uncensored. Both number of uncensored observations and
    the percentage of censored observations appear to be well within plotable
    limits, but cenboxplot() returns this error:

    cenboxplot(as.d$quant, as.d$ceneq1, as.d$era, range=1.5, main='Dissolved
    Arsenic', ylab='Concentration (mg/L)', xlab='Time Period')
    Error in if ((length(obs[censored])/length(obs)) > 0.8) { :
    missing value where TRUE/FALSE needed

    I would like to understand how the function obtains a censored ratio > 0.8
    when it is actually 0.3114.

    Displaying the data frame, as.d, has a logical TRUE or FALSE for each row;
    it can be provided if needed.

    Rich
  • David Winsemius at Aug 20, 2012 at 5:14 pm

    On Aug 20, 2012, at 9:22 AM, Rich Shepard wrote:
    On Thu, 16 Aug 2012, David L Lorenz wrote:

    The cenboxplot function uses cenros to estimate the censored
    values. The
    cenros function requires at least 2 uncensored observations to be
    able to
    do the regression. The cenros function does issue a warning when
    there are
    more than 80% censored data, but that is suppressed in cenboxplot.
    There must be something other than > 2 uncensored observations in
    my data
    that prevent cenboxplot from functioning. For example, dissolved
    arsenic
    concentrations have 578 total observations. Of these, 180 (31.14%) are
    censored and 398 are uncensored. Both number of uncensored
    observations and
    the percentage of censored observations appear to be well within
    plotable
    limits, but cenboxplot() returns this error:

    cenboxplot(as.d$quant, as.d$ceneq1, as.d$era, range=1.5,
    main='Dissolved
    Arsenic', ylab='Concentration (mg/L)', xlab='Time Period')
    Error in if ((length(obs[censored])/length(obs)) > 0.8) { :
    missing value where TRUE/FALSE needed

    I would like to understand how the function obtains a censored
    ratio > 0.8
    when it is actually 0.3114.
    I see no evidence from what you have posted that the function "obtains
    a censored ratio > 0.8". The error messages says there were missing
    values. You might consider looking at :

    table(Q=is.na(as.d$quant), CE=is.na(as.d$ceneq1), ERA=as.d$era)

    My guess is that this test is failing for one or more categories of
    as.d$era.
    Displaying the data frame, as.d, has a logical TRUE or FALSE for
    each row;
    it can be provided if needed.
    You could use dput() and post through Nabble, which would let people
    download if they were interested. Seems on the large size to be
    considered "minimal" .

    --

    David Winsemius, MD
    Alameda, CA, USA
  • William Dunlap at Aug 20, 2012 at 5:27 pm
    You get this error if the factor given as the group argument
    has any unused factor levels. E.g.,
    library(NADA)
    data(Golden)
    with(Golden, cenboxplot(Blood, BloodCen, factor(DosageGroup,levels=c("Low","High","<unused>"))))
    Error in if ((length(obs[censored])/length(obs)) > 0.8) { :
    missing value where TRUE/FALSE needed

    (0/0 is NaN, which triggers the 'missing value where TRUE/FALSE needed' message.)

    You should complain to the maintainer of the NADA package - the test should be
    more like 'length(obs)>0 && ...'.

    To work around it use factor(group) instead of group when calling cenboxplot.

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com

    -----Original Message-----
    From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
    Of David Winsemius
    Sent: Monday, August 20, 2012 10:15 AM
    To: Rich Shepard
    Cc: r-help at r-project.org
    Subject: Re: [R] NADA package/cenboxplot() method: maximum censored percentage

    On Aug 20, 2012, at 9:22 AM, Rich Shepard wrote:
    On Thu, 16 Aug 2012, David L Lorenz wrote:

    The cenboxplot function uses cenros to estimate the censored
    values. The
    cenros function requires at least 2 uncensored observations to be
    able to
    do the regression. The cenros function does issue a warning when
    there are
    more than 80% censored data, but that is suppressed in cenboxplot.
    There must be something other than > 2 uncensored observations in
    my data
    that prevent cenboxplot from functioning. For example, dissolved
    arsenic
    concentrations have 578 total observations. Of these, 180 (31.14%) are
    censored and 398 are uncensored. Both number of uncensored
    observations and
    the percentage of censored observations appear to be well within
    plotable
    limits, but cenboxplot() returns this error:

    cenboxplot(as.d$quant, as.d$ceneq1, as.d$era, range=1.5,
    main='Dissolved
    Arsenic', ylab='Concentration (mg/L)', xlab='Time Period')
    Error in if ((length(obs[censored])/length(obs)) > 0.8) { :
    missing value where TRUE/FALSE needed

    I would like to understand how the function obtains a censored
    ratio > 0.8
    when it is actually 0.3114.
    I see no evidence from what you have posted that the function "obtains
    a censored ratio > 0.8". The error messages says there were missing
    values. You might consider looking at :

    table(Q=is.na(as.d$quant), CE=is.na(as.d$ceneq1), ERA=as.d$era)

    My guess is that this test is failing for one or more categories of
    as.d$era.
    Displaying the data frame, as.d, has a logical TRUE or FALSE for
    each row;
    it can be provided if needed.
    You could use dput() and post through Nabble, which would let people
    download if they were interested. Seems on the large size to be
    considered "minimal" .

    --

    David Winsemius, MD
    Alameda, CA, USA

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Rich Shepard at Aug 20, 2012 at 5:30 pm

    On Mon, 20 Aug 2012, David Winsemius wrote:

    I see no evidence from what you have posted that the function "obtains a
    censored ratio > 0.8". The error messages says there were missing values.
    You might consider looking at :
    David,

    Missing values were removed from the data before reading into R.
    You could use dput() and post through Nabble, which would let people
    download if they were interested. Seems on the large size to be considered
    "minimal" .
    OK.

    Rich
  • Rich Shepard at Aug 20, 2012 at 5:49 pm

    On Mon, 20 Aug 2012, William Dunlap wrote:

    You get this error if the factor given as the group argument
    has any unused factor levels. E.g.,
    with(Golden, cenboxplot(Blood, BloodCen, factor(DosageGroup,levels=c("Low","High","<unused>"))))
    Error in if ((length(obs[censored])/length(obs)) > 0.8) { :
    missing value where TRUE/FALSE needed
    Bill,

    There are only two grouping levels: pre-mining and mining:

    str(as.d)
    'data.frame': 578 obs. of 8 variables:
    $ site : Factor w/ 64 levels "D-1","D-2","D-3",..: 12 12 12 12 12 12 12 12 ...
    $ sampdate: Date, format: "1993-01-21" "1993-02-11" ...
    $ era : Factor w/ 2 levels "Mining","Pre-mining": 2 2 2 2 2 2 2 2 ...
    $ param : Factor w/ 64 levels "AgDis","AgTot",..: 6 6 6 6 6 6 6 6 6 6 ...
    $ quant : num 0.004 0.004 0.005 0.005 0.003 0.006 0.005 0.004 0.003 ...
    $ ceneq1 : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
    $ floor : num 0.004 0.004 0.005 0.005 0.003 0.006 0.005 0.004 0.003 ...
    $ ceiling : num 0.004 0.004 0.005 0.005 0.003 0.006 0.005 0.004 0.003 ...
    To work around it use factor(group) instead of group when calling cenboxplot.
    Since era is already a factor this throws another error.

    The gzipped output of dput is attached.

    Thanks,

    Rich

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedAug 15, '12 at 9:28p
activeAug 20, '12 at 5:49p
posts8
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase