Grokbase Groups R r-help August 2010
FAQ
a closer look to the help on predict.glm will reveal that the function
accepts a 'type' argument.
In you case 'type = response' will give you the results in probabilities
(that it seems to be what you are looking for).
There also is an example on use of the 'type' argument at the end of the
page.

Stefano

-----Messaggio originale-----
Da: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org]Per conto di Troy S
Inviato: Friday, August 06, 2010 6:31 PM
A: Michael Bedward
Cc: r-help at r-project.org
Oggetto: Re: [R] Confidence Intervals for logistic regression


Michael,

Thanks for the reply. I believe Aline was sgiving me CI's on coefficients
as well.

So c(pred$fit + 1.96 * pred$se.fit, pred$fit - 1.96 *
pred$se.fit) gives me the CI on the logits if I understand correctly? Maybe
the help on predict.glm can be updated.

Thanks!
On 6 August 2010 01:46, Michael Bedward wrote:

Sorry about earlier reply - didn't read your email properly (obviously :)

You're suggestion was right, so as well as method for Aline below,
another way of doing the same thing is:

pred <- predict(y.glm, newdata= something, se.fit=TRUE)
ci <- matrix( c(pred$fit + 1.96 * pred$se.fit, pred$fit - 1.96 *
pred$se.fit), ncol=2 )

lines( something, plogis( ci[,1] ) )
lines( something, plogis( ci[,2] ) )


On 6 August 2010 18:39, aline uwimana wrote:
Dear Troy,
use this commend, your will get IC95% and OR.

logistic.model <- glm(formula =y~ x1+x2, family = binomial)
summary(logistic.model)

sum.coef<-summary(logistic.model)$coef

est<-exp(sum.coef[,1])
upper.ci<-exp(sum.coef[,1]+1.96*sum.coef[,2])
lower.ci<-exp(sum.coef[,1]-1.96*sum.coef[,2])

cbind(est,upper.ci,lower.ci)

regards.

2010/8/6 Troy S <troysocks-twigs@yahoo.com>
Dear UseRs,

I have fitted a logistic regression using glm and want a 95% confidence
interval on a response probability. Can I use

predict(model, newdata, se.fit=T)

Will fit +/- 1.96se give me a 95% of the logit? And then
exp(fit +/- 1.96se) / (exp(fit +/- 1.96se) +1) to get the probabilities?

Troy

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rispetta l'ambiente: Se non ti ? necessario, non stampare questa mail.


"Le informazioni contenute nel presente messaggio di posta elettronica e in ogni suo allegato sono da considerarsi riservate e il destinatario della email ? l'unico autorizzato
ad usarle, copiarle e, sotto la propria responsabilit?, divulgarle. Chiunque riceva questo messaggio per errore senza esserne il destinatario deve immediatamente rinviarlo
al mittente cancellando l'originale. Eventuali dati personali e sensibili contenuti nel presente messaggio e/o suoi allegati vanno trattati nel rispetto della normativa
in materia di privacy ( DLGS n.196/'03)".

Search Discussions

  • Troy S at Aug 7, 2010 at 5:57 am
    Stefano,

    I was aware of this option. I was assuming it was not ok to do fit +/- 1.96
    se when you requested probabilities. If this is legitimate then all the
    better.

    Thanks!
    Troy
    On 6 August 2010 22:25, Guazzetti Stefano wrote:

    a closer look to the help on predict.glm will reveal that the function
    accepts a 'type' argument.
    In you case 'type = response' will give you the results in probabilities
    (that it seems to be what you are looking for).
    There also is an example on use of the 'type' argument at the end of the
    page.

    Stefano

    -----Messaggio originale-----
    Da: r-help-bounces@r-project.org
    Per conto di Troy S
    Inviato: Friday, August 06, 2010 6:31 PM
    A: Michael Bedward
    Cc: r-help@r-project.org
    Oggetto: Re: [R] Confidence Intervals for logistic regression


    Michael,

    Thanks for the reply. I believe Aline was sgiving me CI's on coefficients
    as well.

    So c(pred$fit + 1.96 * pred$se.fit, pred$fit - 1.96 *
    pred$se.fit) gives me the CI on the logits if I understand correctly?
    Maybe
    the help on predict.glm can be updated.

    Thanks!
    On 6 August 2010 01:46, Michael Bedward wrote:

    Sorry about earlier reply - didn't read your email properly (obviously :)

    You're suggestion was right, so as well as method for Aline below,
    another way of doing the same thing is:

    pred <- predict(y.glm, newdata= something, se.fit=TRUE)
    ci <- matrix( c(pred$fit + 1.96 * pred$se.fit, pred$fit - 1.96 *
    pred$se.fit), ncol=2 )

    lines( something, plogis( ci[,1] ) )
    lines( something, plogis( ci[,2] ) )


    On 6 August 2010 18:39, aline uwimana wrote:
    Dear Troy,
    use this commend, your will get IC95% and OR.

    logistic.model <- glm(formula =y~ x1+x2, family = binomial)
    summary(logistic.model)

    sum.coef<-summary(logistic.model)$coef

    est<-exp(sum.coef[,1])
    upper.ci<-exp(sum.coef[,1]+1.96*sum.coef[,2])
    lower.ci<-exp(sum.coef[,1]-1.96*sum.coef[,2])

    cbind(est,upper.ci,lower.ci)

    regards.

    2010/8/6 Troy S <troysocks-twigs@yahoo.com>
    Dear UseRs,

    I have fitted a logistic regression using glm and want a 95%
    confidence
    interval on a response probability. Can I use

    predict(model, newdata, se.fit=T)

    Will fit +/- 1.96se give me a 95% of the logit? And then
    exp(fit +/- 1.96se) / (exp(fit +/- 1.96se) +1) to get the
    probabilities?
    Troy

    [[alternative HTML version deleted]]

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    [[alternative HTML version deleted]]

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    [[alternative HTML version deleted]]

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    Rispetta l'ambiente: Se non ti è necessario, non stampare questa mail.


    "Le informazioni contenute nel presente messaggio di posta elettronica e in
    ogni suo allegato sono da considerarsi riservate e il destinatario della
    email è l'unico autorizzato
    ad usarle, copiarle e, sotto la propria responsabilità, divulgarle.
    Chiunque riceva questo messaggio per errore senza esserne il destinatario
    deve immediatamente rinviarlo
    al mittente cancellando l'originale. Eventuali dati personali e sensibili
    contenuti nel presente messaggio e/o suoi allegati vanno trattati nel
    rispetto della normativa
    in materia di privacy ( DLGS n.196/'03)".
  • Michael Bedward at Aug 7, 2010 at 7:29 am

    I was aware of this option.? I was assuming it was not ok to do fit +/- 1.96
    se when you requested probabilities.? If this is legitimate then all the
    better.
    I don't think it is. I understood that you should do the calculation
    in the scale of the linear predictor and then transform to
    probabilities.

    Happy to be corrected if that's wrong.

    Michael
  • Peter Dalgaard at Aug 7, 2010 at 8:37 am

    Michael Bedward wrote:
    I was aware of this option. I was assuming it was not ok to do fit +/- 1.96
    se when you requested probabilities. If this is legitimate then all the
    better.
    I don't think it is. I understood that you should do the calculation
    in the scale of the linear predictor and then transform to
    probabilities.

    Happy to be corrected if that's wrong.
    Probably, neither is optimal, although any transformed scale is
    asymptotically equivalent. E.g., neither the probability scale nor the
    logit scale stabilizes the variance of a simple proportion (the arcsine
    transform does), so test-based CIs should really be asymmetric in both
    cases rather than just +/- 1.96se.

    However, working on the linear predictor scale has the advantage that
    CIs by definition will not cross the boundaries of the parameter space.
    (For the "usual" link functions: logit, probit, cloglog, that is; it's
    not true for the identity link, obviously.)

    --
    Peter Dalgaard
    Center for Statistics, Copenhagen Business School
    Phone: (+45)38153501
    Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
  • Michael Bedward at Aug 7, 2010 at 9:29 am
    Thanks for that clarification Peter - much appreciated.

    Is there an R function that you'd recommend for calculating more valid CIs ?

    Michael
    On 7 August 2010 18:37, Peter Dalgaard wrote:

    Probably, neither is optimal, although any transformed scale is
    asymptotically equivalent. E.g., neither the probability scale nor the
    logit scale stabilizes the variance of a simple proportion (the arcsine
    transform does), so test-based CIs should really be asymmetric in both
    cases rather than just +/- 1.96se.

    However, working on the linear predictor scale has the advantage that
    CIs by definition will not cross the boundaries of the parameter space.
    (For the "usual" link functions: logit, probit, cloglog, that is; it's
    not true for the identity link, obviously.)

    --
    Peter Dalgaard
    Center for Statistics, Copenhagen Business School
    Phone: (+45)38153501
    Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com
  • Ted Harding at Aug 7, 2010 at 10:42 am

    On 07-Aug-10 09:29:41, Michael Bedward wrote:
    Thanks for that clarification Peter - much appreciated.

    Is there an R function that you'd recommend for calculating
    more valid CIs ?
    Michael
    It depends on what you want to mean by "more valid"! If you have
    a 95% CI for the linear predictor (say L(x) at X=x), then the
    probability that the CI will include the true value of L(x)
    is 95% (more or less accurately, depending on what approximation,
    if any, was used to obtain the CI). Thus, if A(Y) and B(Y) are the
    lower and upper limits of a 95% CI for L(x) as functions of the data Y,

    P(A(Y) < L(x) < B(Y)) = 0.95 (to within approximation)

    and this may be asymmetrical in that we may have
    P((A(Y) > L(x)) = 1 - P(B(Y) < L(x)) != 0.025
    (e.g. it may come out as a 1%:4% split of the 5%).

    The response probability P(Y=1 | X=x) will be a monotonic function
    F(L(x)) of x -- e.g. for the logistic exp(L)/(1+exp(L)), increasing
    from 0 to 1. Then {F(A(Y)), F(B(Y)} is a 95% CI for P = F(x),
    since P[A(Y) < L(x) < B(Y)] = P[F(A(Y)) < F(L(x))=P(x) < F(B(Y))]. Also,
    P[A(Y) > L(x)] = P(F(A(Y)) > F(L(x) = P(x)] and
    P[B(Y) < L(x)] = P(F(B(Y)) < F(L(x) = P(x)]
    for exactly the same reason (monotonicity of F). Hence the split
    of the 5% between left tail and right tail ion the response scale
    P(x) = F(L(x)) is exactly the same as the split on the linear
    predictor scale L(x).

    Therefore, on that front (comparison of probabilities of coverage),
    the CI transformed to the response scale {F(A(Y)), F(B(Y)} is exactly
    as valid as the CI {A(Y),B(Y)} on the original linear predictor scale.
    In particular, if the latter is "equi-tailed" (2.5% on either side)
    then the former will be too. If that is what you mean by "valid",
    then you're finished.

    However, possibly you may want "valid" to mean "extending to equal
    distances on either side of the point estimate" -- e.g. as you
    do with Estimate +/- 1.96*SE. It may be that, on the linear predictor
    scale, you achieve this and also equi-tailed (2.5% either way).

    But then, when you transform to the response scale, you will lose
    that symmetry: F(Est - 1.96*SE) and F(Est + 1.96*SE) will not
    be equidistant from F(Est) (though the equi-tailed 2.5%:2.5%
    of the tail probability will be preserved).

    If you have a reason for wanting to, you can start with a 95% CI
    for L(x) which is not equi-tailed, but does have the property of
    symmetry in the response scale: F(Est - 1.96*SE) and F(Est + 1.96*SE)
    will be equidistant from F(Est). So you could set up the CI for
    L(x) as {A(x) = Est - c0(x)*SE, B(x) = Est + c1(x)*SE} where c0(x)
    and c1(x) (which in general depend on x) are chosen so that
    you get symmetry on the response scale. But then you will lose
    the equi-tailed property on the linear predictor scale, hence also
    on the response scale.

    So you can't have everything at once, and it depends on what you
    want to mean by "valid"!

    However, in the case of response being the probability of Y=1,
    you might want to be careful about symmetry on the response
    scale, since that could result in a CI which goes above 1 or
    below 0, which would not be "valid" ...

    For large samples, asymptotically all these issues tend to dwindle
    into near-irrelevance, since locally the reponse is close to linear
    and whatever you achieve on one scale will be (close to) achieved
    on the other scale.

    Hoping this helps,
    Ted.






    On 7 August 2010 18:37, Peter Dalgaard wrote:

    Probably, neither is optimal, although any transformed scale is
    asymptotically equivalent. E.g., neither the probability scale
    nor the logit scale stabilizes the variance of a simple proportion
    (the arcsine transform does), so test-based CIs should really be
    asymmetric in both cases rather than just +/- 1.96se.

    However, working on the linear predictor scale has the advantage
    that CIs by definition will not cross the boundaries of the
    parameter space. (For the "usual" link functions: logit, probit,
    cloglog, that is; it's not true for the identity link, obviously.)
    --
    Peter Dalgaard
    Center for Statistics, Copenhagen Business School
    Phone: (+45)38153501
    Email: pd.mes at cbs.dk _Priv: PDalgd at gmail.com
    --------------------------------------------------------------------
    E-Mail: (Ted Harding) <ted.harding@manchester.ac.uk>
    Fax-to-email: +44 (0)870 094 0861
    Date: 07-Aug-10 Time: 11:42:09
    ------------------------------ XFMail ------------------------------
  • Ben Bolker at Aug 7, 2010 at 2:43 pm
    <Ted.Harding <at> manchester.ac.uk> writes:
    On 07-Aug-10 09:29:41, Michael Bedward wrote:
    Thanks for that clarification Peter - much appreciated.

    Is there an R function that you'd recommend for calculating
    more valid CIs ?
    Michael
    It depends on what you want to mean by "more valid"! If you have
    a 95% CI for the linear predictor (say L(x) at X=x), then the
    probability that the CI will include the true value of L(x)
    is 95% (more or less accurately, depending on what approximation,
    if any, was used to obtain the CI). Thus, if A(Y) and B(Y) are the
    lower and upper limits of a 95% CI for L(x) as functions of the data Y, [snip]
    So you can't have everything at once, and it depends on what you
    want to mean by "valid"!
    [snip]

    Yow. Nothing like r-help on a Saturday morning!
    Practically speaking, I think the previous recommendations (confint()
    and boot.ci()) are probably best. I prefer equal probabilities
    in tails to a symmetric confidence interval. (You could also try
    for Bayesian credible intervals, which are symmetric in the probability
    cutoff for each side ...)
    The other thing to keep in mind is that once you get down to
    this level of rigor, it's extremely likely that the major source
    of error/uncertainty in your results will be some other
    systematic error or violation of the assumptions. Are your data
    *really* distributed the way you think? Linear on the scale of the
    linear predictor, independent, homogeneous probabilities/exchangeability
    within groups, ... ?
    Or maybe that's just my excuse for not worrying too much.
  • Martin Maechler at Aug 7, 2010 at 9:56 am

    "PD" == Peter Dalgaard <pdalgd@gmail.com>
    on Sat, 07 Aug 2010 10:37:49 +0200 writes:
    PD> Michael Bedward wrote:
    I was aware of this option. I was assuming it was not ok to do fit +/- 1.96
    se when you requested probabilities. If this is legitimate then all the
    better.
    >>
    I don't think it is. I understood that you should do the calculation
    in the scale of the linear predictor and then transform to
    probabilities.
    >>
    Happy to be corrected if that's wrong.
    PD> Probably, neither is optimal, although any transformed scale is
    PD> asymptotically equivalent. E.g., neither the probability scale nor the
    PD> logit scale stabilizes the variance of a simple proportion (the arcsine
    PD> transform does), so test-based CIs should really be asymmetric in both
    PD> cases rather than just +/- 1.96se.

    PD> However, working on the linear predictor scale has the advantage that
    PD> CIs by definition will not cross the boundaries of the parameter space.
    PD> (For the "usual" link functions: logit, probit, cloglog, that is; it's
    PD> not true for the identity link, obviously.)

    I'm coming late to the thread,
    but it seems that nobody has yet given the advice which I would
    very *strongly* suggest to anyone asking for confidence
    intervals in GLMs:

    Use confint()
    which (for "glm"s) will load the MASS package and use likelihood
    - profiling, giving (much) more reliable confidence intervals,
    notably in the case of the infamous Hauck-Donner phenomenon,
    which has been mentioned many times "here", and notably in
    MASS (the book!), I think even in its first edition.

    Even more reliable (probably) would be to use the (recommended)
    'boot' package and use bootstrap confidence intervals, i.e.,
    boot.ci() there.

    Martin Maechler, ETH Zurich
  • Michael Bedward at Aug 7, 2010 at 10:29 am

    On 7 August 2010 19:56, Martin Maechler wrote:

    I'm coming late to the thread,
    but it seems that nobody has yet given the advice which I would
    very *strongly* suggest to anyone asking for confidence
    intervals in GLMs:

    Use ?confint()
    confint was actually mentioned in the second post on the thread :)
    But it's not what the OP is after.

    Michael
  • Peter Dalgaard at Aug 7, 2010 at 11:46 am

    Michael Bedward wrote:
    On 7 August 2010 19:56, Martin Maechler wrote:

    I'm coming late to the thread,
    but it seems that nobody has yet given the advice which I would
    very *strongly* suggest to anyone asking for confidence
    intervals in GLMs:

    Use confint()
    confint was actually mentioned in the second post on the thread :)
    But it's not what the OP is after.
    Yep. The problem is that we cannot (yet? easily?) do confint on a
    transformation of the parameters, hence not on the predicted values (on
    whatever scale).

    Another thing is that confint() isn't quite a cure-all because it relies
    on the distribution of the (signed square root of) the LRT, and if that
    is substantially different from chi-square(1) (or N(0,1)), then you
    might still not obtain improved coverage, etc.

    That being said, one should probably say that it is quite common to base
    CI's on the linear predictor scale, and there's nothing REALLY wrong
    with that procedure. As will have transpired by now, once you leave this
    rather well-trodden road, you can easily find yourself in a rather
    impenetrable thicket...

    As far as I recall the asymptotics, all of the "+/- 2*se" procedures
    have approximation errors of the same order as that caused by the
    discrete distribution of the response, so it is not like one method is
    going to "win" by orders of magnitude.
    Michael

    --
    Peter Dalgaard
    Center for Statistics, Copenhagen Business School
    Phone: (+45)38153501
    Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedAug 7, '10 at 5:25a
activeAug 7, '10 at 2:43p
posts10
users7
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase