Grokbase Groups R r-help May 2012
FAQ
Dear list,

I'm a bit perplexed why the 95% confidence bands for the predicted probabilities for units where x=0 and x=1 overlap in the following instance.

I've simulated binary data to which I've then fitted a simple logistic regression model, with one covariate, and the coefficient on x is statistically significant at the 0.05 level. I've then used two different methods to generate 95% confidence bands for predicted probabilities, for each of two possible values of x. Given the result of the model, I expected the bands not to overlap? but they do.

Can anyone please explain where I've gone wrong with the following code, OR why it makes sense for the bands to overlap, despite the statistically significant beta coefficient? There may be a good statistical reason for this, but I'm not aware of it.

Many thanks,
Malcolm Fairbrother


n <- 120
set.seed(030512)
x <- rbinom(n, 1, 0.5)
dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x)))
mod1 <- glm(ybe ~ x, dat, family=binomial)
summary(mod1) # coefficient on x is statistically significant at the 0.05 level? almost at the 0.01 level

pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T)
with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = plogis(fit + 1.96*se.fit))) # confidence bands based on SEs

# simulation-based confidence bands:
sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, family=binomial))))
pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975)))
pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975)))
rbind(pred0, pred1)

# the upper bound of the prediction for x=0 is greater than the lower bound of the prediction for x=1, using both methods

Search Discussions

  • Bert Gunter at May 3, 2012 at 6:10 pm
    Your post suggests statistical confusion on several levels. But as
    this concerns statistics, not R, consult your local statistician or
    post on a statistical help list (e.g. stats.stackexchange.com).

    Incidentally, the short answer to your question about the overlap is:
    why shouldn't they? You will hopefully receive a more informative
    longer answer from the above suggested (or similar) resources.

    -- Bert


    On Thu, May 3, 2012 at 9:37 AM, Malcolm Fairbrother
    wrote:
    Dear list,

    I'm a bit perplexed why the 95% confidence bands for the predicted probabilities for units where x=0 and x=1 overlap in the following instance.

    I've simulated binary data to which I've then fitted a simple logistic regression model, with one covariate, and the coefficient on x is statistically significant at the 0.05 level. I've then used two different methods to generate 95% confidence bands for predicted probabilities, for each of two possible values of x. Given the result of the model, I expected the bands not to overlap? but they do.

    Can anyone please explain where I've gone wrong with the following code, OR why it makes sense for the bands to overlap, despite the statistically significant beta coefficient? There may be a good statistical reason for this, but I'm not aware of it.

    Many thanks,
    Malcolm Fairbrother


    n <- 120
    set.seed(030512)
    x <- rbinom(n, 1, 0.5)
    dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x)))
    mod1 <- glm(ybe ~ x, dat, family=binomial)
    summary(mod1) # coefficient on x is statistically significant at the 0.05 level? almost at the 0.01 level

    pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T)
    with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = plogis(fit + 1.96*se.fit))) ?# confidence bands based on SEs

    # simulation-based confidence bands:
    sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, family=binomial))))
    pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975)))
    pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975)))
    rbind(pred0, pred1)

    # the upper bound of the prediction for x=0 is greater than the lower bound of the prediction for x=1, using both methods

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


    --

    Bert Gunter
    Genentech Nonclinical Biostatistics

    Internal Contact Info:
    Phone: 467-7374
    Website:
    http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
  • Malcolm Fairbrother at May 6, 2012 at 6:18 pm
    For the record, I was overlooking the obvious point that the statistical significance of the coefficient indicates only that the confidence interval for the *difference* in predicted probabilities will not overlap zero. It does not indicate that the two confidence intervals I was originally calculating will not overlap each other.

    In case it's useful to others, the code below generates a confidence interval for the difference, on the response (probability) scale.

    - Malcolm

    # CI by SE:
    sapply(c(1.96,-1.96), function(a) plogis(sum(coef(mod1)) - a*summary(mod1)$coefficients[2,"Std. Error"]) - plogis(coef(mod1)[1]))

    # CI by simulation:
    quantile(plogis(sims%*%c(1,1)) - plogis(sims%*%c(1,0)), c(0, 0.025, 0.5, 0.975, 1))


    On 3 May 2012, at 19:10, Bert Gunter wrote:

    Your post suggests statistical confusion on several levels. But as
    this concerns statistics, not R, consult your local statistician or
    post on a statistical help list (e.g. stats.stackexchange.com).

    Incidentally, the short answer to your question about the overlap is:
    why shouldn't they? You will hopefully receive a more informative
    longer answer from the above suggested (or similar) resources.

    -- Bert


    On Thu, May 3, 2012 at 9:37 AM, Malcolm Fairbrother
    wrote:
    Dear list,

    I'm a bit perplexed why the 95% confidence bands for the predicted probabilities for units where x=0 and x=1 overlap in the following instance.

    I've simulated binary data to which I've then fitted a simple logistic regression model, with one covariate, and the coefficient on x is statistically significant at the 0.05 level. I've then used two different methods to generate 95% confidence bands for predicted probabilities, for each of two possible values of x. Given the result of the model, I expected the bands not to overlap? but they do.

    Can anyone please explain where I've gone wrong with the following code, OR why it makes sense for the bands to overlap, despite the statistically significant beta coefficient? There may be a good statistical reason for this, but I'm not aware of it.

    Many thanks,
    Malcolm Fairbrother


    n <- 120
    set.seed(030512)
    x <- rbinom(n, 1, 0.5)
    dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x)))
    mod1 <- glm(ybe ~ x, dat, family=binomial)
    summary(mod1) # coefficient on x is statistically significant at the 0.05 level? almost at the 0.01 level

    pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T)
    with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = plogis(fit + 1.96*se.fit))) # confidence bands based on SEs

    # simulation-based confidence bands:
    sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, family=binomial))))
    pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975)))
    pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975)))
    rbind(pred0, pred1)

    # the upper bound of the prediction for x=0 is greater than the lower bound of the prediction for x=1, using both methods

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedMay 3, '12 at 4:37p
activeMay 6, '12 at 6:18p
posts3
users2
websiter-project.org
irc#r

People

Translate

site design / logo © 2023 Grokbase