FAQ
Hi,

I understand that dichotimization of the predicted probabilities after
logistic regression is philosophically questionable, throwing out
information, etc.

But I want to do it anyway. I'd like to include as a measure of fit %
of observations correctly classified because it's measured in units
that non-statisticians can understand more easily than area under the
ROC curve, Dxy, etc.

Am I right that there is an optimal Y>=q probability cutoff, at which
the True Positive Rate is high and the False Positive Rate is low?
Visually, it would be the elbow in the ROC curve, right?
My reasoning is that even if you had a near-perfect model, you could
set a stupidly low (high) cutoff and have a higher false positive
(negative) rate than would be optimal.

I know the standard default or starting point is Y>=.5, but if my
above reasoning is correct, there ought to be an optimal cutoff for a
given model. Is there an easy way to determine that cutoff in R
without writing my own script to iterate through possible breakpoints
and calculating classification accuracy at each one?

Thanks in advance.
-Dan

Search Discussions

  • David Winsemius at Feb 14, 2011 at 5:45 am

    On Feb 14, 2011, at 12:31 AM, Daniel Weitzenfeld wrote:

    Hi,

    I understand that dichotimization of the predicted probabilities after
    logistic regression is philosophically questionable, throwing out
    information, etc.

    But I want to do it anyway. I'd like to include as a measure of fit %
    of observations correctly classified because it's measured in units
    that non-statisticians can understand more easily than area under the
    ROC curve, Dxy, etc.

    Am I right that there is an optimal Y>=q probability cutoff, at which
    the True Positive Rate is high and the False Positive Rate is low?
    Only if the data supports it.
    Visually, it would be the elbow in the ROC curve, right?
    If there is an "elbow", perhaps. The real answer is that you should
    thoughtfully consider the consequences of a wrong answer that the test
    is negative (False -) and those of a wrong answer that a test is
    positive (False +) and then make a decision that properly balances
    both the costs sand the probabilities.

    My reasoning is that even if you had a near-perfect model, you could
    set a stupidly low (high) cutoff and have a higher false positive
    (negative) rate than would be optimal.

    I know the standard default or starting point is Y>=.5,
    Huh... what is Y?
    but if my
    above reasoning is correct, there ought to be an optimal cutoff for a
    given model. Is there an easy way to determine that cutoff in R
    without writing my own script to iterate through possible breakpoints
    and calculating classification accuracy at each one?
    There are packages that handle ROC analyses.
    Thanks in advance.
    -Dan

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    David Winsemius, MD
    West Hartford, CT
  • Frank Harrell at Feb 14, 2011 at 1:49 pm
    It is very seldom that such a cutoff is real and validates in another
    dataset. As described so well in Steyerberg's book Clinical Prediction
    Modeling there are many good ways to present models to non-statisticians.
    Nomograms and calibration curves with histograms of predicted probabilities
    are two good ones.

    There is a reason that the speedometer in your car doesn't just read "slow"
    and "fast".

    Frank

    -----
    Frank Harrell
    Department of Biostatistics, Vanderbilt University
    --
    View this message in context: http://r.789695.n4.nabble.com/Optimal-Y-q-cutoff-after-logistic-regression-tp3304474p3305012.html
    Sent from the R help mailing list archive at Nabble.com.
  • Viechtbauer Wolfgang (STAT) at Feb 14, 2011 at 2:30 pm
    That's definitely one for the fortune package!

    Wolfgang
    -----Original Message-----
    From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
    On Behalf Of Frank Harrell
    Sent: Monday, February 14, 2011 14:50
    To: r-help at r-project.org
    Subject: Re: [R] Optimal Y>=q cutoff after logistic regression


    It is very seldom that such a cutoff is real and validates in another
    dataset. As described so well in Steyerberg's book Clinical Prediction
    Modeling there are many good ways to present models to non-statisticians.
    Nomograms and calibration curves with histograms of predicted
    probabilities are two good ones.

    There is a reason that the speedometer in your car doesn't just read
    "slow" and "fast".

    Frank

    -----
    Frank Harrell
    Department of Biostatistics, Vanderbilt University
    --
    View this message in context: http://r.789695.n4.nabble.com/Optimal-Y-q-
    cutoff-after-logistic-regression-tp3304474p3305012.html
    Sent from the R help mailing list archive at Nabble.com.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-
    guide.html
    and provide commented, minimal, self-contained, reproducible code.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedFeb 14, '11 at 5:31a
activeFeb 14, '11 at 2:30p
posts4
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase