Grokbase Groups R r-help March 2002
FAQ
Hi,

I am trying to understand the alternative methods that are available for
selecting
variables in a regression without simply imposing my own bias (having "good
judgement"). The methods implimented in leaps and step and stepAIC seem to
fall into the general class of stepwise procedures. But these are commonly
condemmed for inducing overfitting.

In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
chapter 3,
they describe a number of procedures that seem better. The use of
cross-validation
in the training stage presumably helps guard against overfitting. They seem
particularly favorable to shrinkage through ridge regressions, and to the
"lasso". This
may not be too surprising, given the authorship. Is the lasso "generally
accepted" as
being a pretty good approach? Has it proved its worth on a variety of
problems? Or is
it at the "interesting idea" stage? What, if anything, would be widely
accepted as
being sensible -- apart from having "good judgement".

In econometrics there is a school (the "LSE methodology") which argues for
what
amounts to stepwise regressions combined with repeated tests of the
properties of
the error terms. (It is actually a bit more complex than that.) This has
been coded in
the program PCGets:
(http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)
If anyone knows how this compares in terms of effectiveness to the methods
discussed in
Hastie et al., I would really be very interested.

Cheers,
Murray

Murray Z. Frank
B.I. Ghert Family Foundation Professor
Strategy & Business Economics
Faculty of Commerce
University of British Columbia
Vancouver, B.C.
Canada V6T 1Z2

phone: 604-822-8480
fax: 604-822-8477
e-mail: Murray.Frank at commerce.ubc.ca

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Search Discussions

  • Prof Brian D Ripley at Mar 1, 2002 at 7:26 am

    On Thu, 28 Feb 2002, Frank, Murray wrote:

    Hi,

    I am trying to understand the alternative methods that are available for
    selecting
    variables in a regression without simply imposing my own bias (having "good
    judgement"). The methods implimented in leaps and step and stepAIC seem to
    fall into the general class of stepwise procedures. But these are commonly
    condemmed for inducing overfitting.
    There are big differences between regression with only continuous variates,
    and regression involving hierarchies of factors. step/stepAIC include the
    latter, the rest do not.

    A second difference is the purpose of selecting a model. AIC is intended
    to select a model which is large enough to include the `true' model, and
    hence to give good predictions. There over-fitting is not a real problem.
    (There are variations on AIC which do not assume some model considered is
    true.) This is a different aim from trying to find the `true' model or
    trying to find the smallest adequate model, both aims for explanation not
    prediction. AIC is often criticised (`condemmed') for not being good at
    what it does not intend to do. [Sometimes R is, too.]

    Shrinkage methods have their advocates for good predictions (including me),
    but they are a different class of statistical methods, that is *not*
    regression. They too have issues of selection, usually how much to shrink
    and often how to calibrate equal shrinkage across predictors. In ridge
    regression choosing the ridge coefficient is not easy, and depends on the
    scaling of the variables. In the neural networks field, shrinkage is widely
    used.
    In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
    chapter 3,
    they describe a number of procedures that seem better. The use of
    I think that is a quite selective account.
    cross-validation
    in the training stage presumably helps guard against overfitting. They seem
    particularly favorable to shrinkage through ridge regressions, and to the
    "lasso". This
    may not be too surprising, given the authorship. Is the lasso "generally
    accepted" as
    being a pretty good approach? Has it proved its worth on a variety of
    problems? Or is
    it at the "interesting idea" stage? What, if anything, would be widely
    accepted as
    being sensible -- apart from having "good judgement".
    Depends on the aim. If you look at the account in Venables & Ripley you
    will see many caveats about any automated method: all statistical problems
    (outside textbooks) come with a context which should be used in selecting
    variables if the aim is explanation, and perhaps also if it is prediction.
    You should use what you know about the variables and the possible
    mechanisms, especially to select derived variables. But generally model
    averaging (which you have not mentioned and is for regression a form of
    shrinkage) seems to have most support for prediction.
    In econometrics there is a school (the "LSE methodology") which argues
    for what amounts to stepwise regressions combined with repeated tests of
    the properties of the error terms. (It is actually a bit more complex
    than that.) This has been coded in the program PCGets:
    (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)
    Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE'
    methodology, associated with Hendry's group (as PcGive and Ox are), and
    there is a link to Hendry (who is in Oxford).
    If anyone knows how this compares in terms of effectiveness to the methods
    discussed in
    Hastie et al., I would really be very interested.
    It has a different aim, I believe. Certainly `effectiveness' has to be
    assessed relative to a clear aim, and simulation studies with true models
    don't seem to me to have the right aim. Statisticians of the Box/Cox/Tukey
    generation would say that effectiveness in deriving scientific insights
    was the real test (and I recall hearing that from those I named).

    Chpater 2 of my `Pattern Recognition and Neural Networks' takes a much
    wider view of the methods available for model selection, and their
    philosophies. Specifically for regression, you might take a look at Frank
    Harrell's book.

    --
    Brian D. Ripley, ripley at stats.ox.ac.uk
    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
    University of Oxford, Tel: +44 1865 272861 (self)
    1 South Parks Road, +44 1865 272860 (secr)
    Oxford OX1 3TG, UK Fax: +44 1865 272595


    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Jari Oksanen at Mar 1, 2002 at 11:03 am

    ripley at stats.ox.ac.uk said:
    A second difference is the purpose of selecting a model. AIC is
    intended to select a model which is large enough to include the `true'
    model, and hence to give good predictions. There over-fitting is not
    a real problem. (There are variations on AIC which do not assume some
    model considered is true.) This is a different aim from trying to
    find the `true' model or trying to find the smallest adequate model,
    both aims for explanation not prediction.
    This may be a stupid question, but perhaps I won't be lashed if I
    confess my stupidity as a preventive measure. About minimal adequate
    model*s*: Murray Aitkin et al. have a book called "Statistical
    Modelling in GLIM" (Ox UP, 1989) where they tell how to find a set
    adequate models in glm (with GLIM), and how one or *several* of these
    adequate models may be minimal. When I read the book, I found this as
    an attractive concept since it showed that you may have several about
    equally good models with different terms, although usual selection
    procedures (including best subsets) finds only one. I have quite often
    seen people to use automatic selection in several subsets and then
    saying that subsets are different because different regressors were
    selected -- although the same regressors could have been about as good,
    but they were never evaluated.

    Now the question: Aitkin's procedure would be very easy to perform in R
    (well, it was easy even in dear old GLIM!), but I have hardly seen it
    used. Is there a reason for this? Is there something dubious in minimal
    adequate modles that makes tehm a no-no, an Erlk?nig that catches us
    innocent children?

    Bibliographic note: I know the procedure from the Aitkin et. al. book,
    and haven't checked the original references. These sources are cited in
    the book:

    Aitkin, M. A. 1974. Simultaneous inference and choice of variable
    subsets in multiple regression. Technometrics 16, 221--227.

    Aitkin, M. A. 1978. The analysis of unbalanced cross-classification
    (with Discussion). J. Roy. Stat. Soc. A 141,
    195--223.

    cheers, jari oksanen
    --
    Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
    Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
    email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/


    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Prof Brian D Ripley at Mar 1, 2002 at 11:22 am

    On Fri, 1 Mar 2002, Jari Oksanen wrote:

    ripley at stats.ox.ac.uk said:
    A second difference is the purpose of selecting a model. AIC is
    intended to select a model which is large enough to include the `true'
    model, and hence to give good predictions. There over-fitting is not
    a real problem. (There are variations on AIC which do not assume some
    model considered is true.) This is a different aim from trying to
    find the `true' model or trying to find the smallest adequate model,
    both aims for explanation not prediction.
    This may be a stupid question, but perhaps I won't be lashed if I
    confess my stupidity as a preventive measure. About minimal adequate
    model*s*: Murray Aitkin et al. have a book called "Statistical
    Modelling in GLIM" (Ox UP, 1989) where they tell how to find a set
    adequate models in glm (with GLIM), and how one or *several* of these
    adequate models may be minimal. When I read the book, I found this as
    an attractive concept since it showed that you may have several about
    equally good models with different terms, although usual selection
    procedures (including best subsets) finds only one. I have quite often
    seen people to use automatic selection in several subsets and then
    saying that subsets are different because different regressors were
    selected -- although the same regressors could have been about as good,
    but they were never evaluated.
    The concept is well-known: Cox for example stresses finding sets of small
    adequate models. That's yet a different aim, as often only one explanation
    is required (or accepted). There is a lot on sets of adequate models:
    Raftery's Occam's window for example (see the reference in my first post).
    Now the question: Aitkin's procedure would be very easy to perform in R
    (well, it was easy even in dear old GLIM!), but I have hardly seen it
    `dear' as in `expensive' is my memory.
    used. Is there a reason for this? Is there something dubious in minimal
    adequate modles that makes tehm a no-no, an Erlkönig that catches us
    innocent children?
    Not in general, but the lack of adoption of the method is a fair indication
    of how it was respected. I've now forgotten the technical flaws.
    Bibliographic note: I know the procedure from the Aitkin et. al. book,
    and haven't checked the original references. These sources are cited in
    the book:

    Aitkin, M. A. 1974. Simultaneous inference and choice of variable
    subsets in multiple regression. Technometrics 16, 221--227.

    Aitkin, M. A. 1978. The analysis of unbalanced cross-classification
    (with Discussion). J. Roy. Stat. Soc. A 141,
    195--223.
    I suggest you do read that paper, especially the discussion. I use it as a
    case study in my MSc class on how *not* to do model selection. It's a very
    good illustration of many of the points of my first posting against fully
    automated procedures.

    There are several analyses of that example in MASS, with alternative
    models selected and spotting many things that Aitkin overlooked. Do read
    Bill Venables' commentaries in MASS too.


    --
    Brian D. Ripley, ripley at stats.ox.ac.uk
    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
    University of Oxford, Tel: +44 1865 272861 (self)
    1 South Parks Road, +44 1865 272860 (secr)
    Oxford OX1 3TG, UK Fax: +44 1865 272595

    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Frank, Murray at Mar 1, 2002 at 6:57 pm
    Thanks for the most informative, and helpful feedback.

    Professor Ripley wrote:
    (most of his message has been edited out)
    There are big differences between regression with only continuous variates,
    and regression involving hierarchies of factors. step/stepAIC include the
    latter, the rest do not.
    In much of Venables and Ripley, bootstrapping keeps popping up. Is there a
    reason not to run step/stepAIC repeatedly on bootstrapped samples from the
    original data? On the face of it, bootstrapping seems intuitively appealing
    in this context. (Would some form of cross-validation on subsamples be
    better?)
    But generally model
    averaging (which you have not mentioned and is for regression a form of
    shrinkage) seems to have most support for prediction.
    What do you mean by model averaging? It does not seem to match the
    discussion
    of model selection that I found in Venables and Ripley (ie pages 186-188).
    Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE'
    methodology, associated with Hendry's group (as PcGive and Ox are), and
    there is a link to Hendry (who is in Oxford).
    Quite right. It is the Hendry group. As far as I can figure out, the main
    specific references are to:
    Hoover, K. D., and Perez, S. J. (1999). Data mining reconsidered:
    Encompassing
    and the general-to specific approach to specification search. Econometrics
    Journal, 2, 167-191.

    Hoover, K. D., and Perez, S. J. (2001). Truth and robustness in
    cross-country
    growth regressions. unpublished paper, Economics Department, University of
    California, Davis.
    It has a different aim, I believe. Certainly `effectiveness' has to be
    assessed relative to a clear aim, and simulation studies with true models
    don't seem to me to have the right aim.
    As suggested, the Hoover and Perez papers are basically simulation studies
    where finding a true model was the aim. The working paper on growth
    regressions
    tries to go further, and seems to have reasonable sounding economic
    conclusions.
    Statisticians of the Box/Cox/Tukey
    generation would say that effectiveness in deriving scientific insights
    was the real test (and I recall hearing that from those I named).
    It is hard to argue with that claim. But it is equally hard to see it as
    complete. How do we define "scientific insight"? Or is it one of those cases
    of: "I don't know how to define it, but I know it when I see it"?

    Murray Z. Frank
    B.I. Ghert Family Foundation Professor
    Strategy & Business Economics
    Faculty of Commerce
    University of British Columbia
    Vancouver, B.C.
    Canada V6T 1Z2

    phone: 604-822-8480
    fax: 604-822-8477
    e-mail: Murray.Frank at commerce.ubc.ca
    -----Original Message-----
    From: Frank, Murray
    Sent: Thursday, February 28, 2002 4:12 PM
    To:
    Subject: step, leaps, lasso, LSE or what?

    Hi,

    I am trying to understand the alternative methods that are available for
    selecting
    variables in a regression without simply imposing my own bias (having
    "good
    judgement"). The methods implimented in leaps and step and stepAIC seem to

    fall into the general class of stepwise procedures. But these are commonly

    condemmed for inducing overfitting.

    In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"
    chapter 3,
    they describe a number of procedures that seem better. The use of
    cross-validation
    in the training stage presumably helps guard against overfitting. They
    seem
    particularly favorable to shrinkage through ridge regressions, and to the
    "lasso". This
    may not be too surprising, given the authorship. Is the lasso "generally
    accepted" as
    being a pretty good approach? Has it proved its worth on a variety of
    problems? Or is
    it at the "interesting idea" stage? What, if anything, would be widely
    accepted as
    being sensible -- apart from having "good judgement".

    In econometrics there is a school (the "LSE methodology") which argues for
    what
    amounts to stepwise regressions combined with repeated tests of the
    properties of
    the error terms. (It is actually a bit more complex than that.) This has
    been coded in
    the program PCGets:
    (http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)
    If anyone knows how this compares in terms of effectiveness to the methods
    discussed in
    Hastie et al., I would really be very interested.

    Cheers,
    Murray

    Murray Z. Frank
    B.I. Ghert Family Foundation Professor
    Strategy & Business Economics
    Faculty of Commerce
    University of British Columbia
    Vancouver, B.C.
    Canada V6T 1Z2

    phone: 604-822-8480
    fax: 604-822-8477
    e-mail: Murray.Frank at commerce.ubc.ca
    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
  • Roger Koenker at Mar 2, 2002 at 4:47 pm

    On Fri, 1 Mar 2002, Brian Ripley wrote:
    Statisticians of the Box/Cox/Tukey
    generation would say that effectiveness in deriving scientific insights
    was the real test (and I recall hearing that from those I named).
    A nice formulation of this viewpoint that happens to be lying on my desk at
    the moment is the following:

    A final general comment is that the discussion above is of the question
    of how to reach conclusions about parameters in a model on which we are
    agreed. It seems to me, however, that a more important matter is how to
    formulate more realistic models that will enable scientifically more
    searching questions to be asked of data.

    D.R Cox, (p 53) Discussion of L.J. Savage's (1962)
    Subjective probability and statistical practice, Methuen.




    url: http://www.econ.uiuc.edu Roger Koenker
    email roger at ysidro.econ.uiuc.edu Department of Economics
    vox: 217-333-4558 University of Illinois
    fax: 217-244-6678 Champaign, IL 61820


    -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
    r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
    Send "info", "help", or "[un]subscribe"
    (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
    _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedMar 1, '02 at 12:12a
activeMar 2, '02 at 4:47p
posts6
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase