On Thu, 28 Feb 2002, Frank, Murray wrote:

Hi,

I am trying to understand the alternative methods that are available for

selecting

variables in a regression without simply imposing my own bias (having "good

judgement"). The methods implimented in leaps and step and stepAIC seem to

fall into the general class of stepwise procedures. But these are commonly

condemmed for inducing overfitting.

There are big differences between regression with only continuous variates,

and regression involving hierarchies of factors. step/stepAIC include the

latter, the rest do not.

A second difference is the purpose of selecting a model. AIC is intended

to select a model which is large enough to include the `true' model, and

hence to give good predictions. There over-fitting is not a real problem.

(There are variations on AIC which do not assume some model considered is

true.) This is a different aim from trying to find the `true' model or

trying to find the smallest adequate model, both aims for explanation not

prediction. AIC is often criticised (`condemmed') for not being good at

what it does not intend to do. [Sometimes R is, too.]

Shrinkage methods have their advocates for good predictions (including me),

but they are a different class of statistical methods, that is *not*

regression. They too have issues of selection, usually how much to shrink

and often how to calibrate equal shrinkage across predictors. In ridge

regression choosing the ridge coefficient is not easy, and depends on the

scaling of the variables. In the neural networks field, shrinkage is widely

used.

In Hastie, Tibshirani and Friedman "The Elements of Statistical Learning"

chapter 3,

they describe a number of procedures that seem better. The use of

I think that is a quite selective account.

cross-validation

in the training stage presumably helps guard against overfitting. They seem

particularly favorable to shrinkage through ridge regressions, and to the

"lasso". This

may not be too surprising, given the authorship. Is the lasso "generally

accepted" as

being a pretty good approach? Has it proved its worth on a variety of

problems? Or is

it at the "interesting idea" stage? What, if anything, would be widely

accepted as

being sensible -- apart from having "good judgement".

Depends on the aim. If you look at the account in Venables & Ripley you

will see many caveats about any automated method: all statistical problems

(outside textbooks) come with a context which should be used in selecting

variables if the aim is explanation, and perhaps also if it is prediction.

You should use what you know about the variables and the possible

mechanisms, especially to select derived variables. But generally model

averaging (which you have not mentioned and is for regression a form of

shrinkage) seems to have most support for prediction.

In econometrics there is a school (the "LSE methodology") which argues

for what amounts to stepwise regressions combined with repeated tests of

the properties of the error terms. (It is actually a bit more complex

than that.) This has been coded in the program PCGets:

(http://www.pcgive.com/pcgets/index.html?content=/pcgets/main.html)

Lots of hyperbolic claims, no references. But I suspect this is `ex-LSE'

methodology, associated with Hendry's group (as PcGive and Ox are), and

there is a link to Hendry (who is in Oxford).

If anyone knows how this compares in terms of effectiveness to the methods

discussed in

Hastie et al., I would really be very interested.

It has a different aim, I believe. Certainly `effectiveness' has to be

assessed relative to a clear aim, and simulation studies with true models

don't seem to me to have the right aim. Statisticians of the Box/Cox/Tukey

generation would say that effectiveness in deriving scientific insights

was the real test (and I recall hearing that from those I named).

Chpater 2 of my `Pattern Recognition and Neural Networks' takes a much

wider view of the methods available for model selection, and their

philosophies. Specifically for regression, you might take a look at Frank

Harrell's book.

--

Brian D. Ripley, ripley at stats.ox.ac.uk

Professor of Applied Statistics,

http://www.stats.ox.ac.uk/~ripley/University of Oxford, Tel: +44 1865 272861 (self)

1 South Parks Road, +44 1865 272860 (secr)

Oxford OX1 3TG, UK Fax: +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-

r-help mailing list -- Read

http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.htmlSend "info", "help", or "[un]subscribe"

(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch

_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._