On 21-Mar-2012 S Ellison wrote:

I get the errors;

glm.fit: fitted probabilities numerically 0 or 1 occurred and

glm.fit: algorithm did not converge

.....

Is there any way to to

fix this problem?

There are two separate issues.

One is the appearance of fitted values at 0 or 1.

The other is the lack of convergence.

The first is usually not fatal; it means that the probabilities

are so close to 0 or 1 that a double precision value can't

distinguish them from 0 or 1.

Often that's a transient condition during iteration and the

final fitted values are inside (0,1), but final fitted values

can also be out there if you have continuous predictor values

a long way out; by itself, that usually won't stop a glm.

The second is a bit more problematic. Sometimes it's just that

you need to increase the maximum number of iterations (see the

control= argument and ?glm.control). That is always worth a try

- use some absurdly high number like 1000 instead of the default

25 and go find some coffee while it thinks about it. If that

solves your problem then you're OK, or at least as OK as you can

be with a data set that hard to fit.

But if you're bootstrapping with some anomalous values it is

also possible that some of your bootstrapped sets have too high

a proportion of anomalies, and under those conditions it's

possible that there could be no sensible optimum within reach.

One way of dealing with that in a boostrap or other simulation

context is to check the 'converged' value and if it's FALSE,

return an NA for your statistic. But of course that is a form

of censoring; if you have a high proportion of such instances

you'd be on very thin ice drawing conclusions.

S Ellison

There is a further general issue. The occurence of

glm.fit: fitted probabilities numerically 0 or 1 occurred and

glm.fit: algorithm did not converge

is usually a symptom of what is called "linear separation".

The simplest instance of this is when there is a single

predictor variable X whose values are, say, ordered as

X[1] < X[2] < ... < X[n-1] < X[n].

If it should happen that the values Y[1] ... Y[n] of the 0/1

response variable Y are all equal to 0 for X[1],...,X[k]

and all equal to 1 for X[k+1],...,X[n], and you are fitting

a glm(Y~X,family=binomial) (which is what you use for logistic

regression), then the formula being fitted is

log(P(Y=1|X)/(Y=0|X)) = (X - m)/s for some values of m and s.

In the circumstances described, for any value of m between

X[k] < m < X[k+1]

the fit can force P[Y=1|X] --> 0 for X = X[1],...,X[k]

and can force P[Y=1|X] --> 1 for X = X[k+1],...,X[n] by

pushing s --> 0 (so then the first set of (X - m)/s --> -Inf

and the second set of (X - m)/s --> +Inf), thus forcing the

probabilities predicted by the model towards agreeing exactly

with the frequencies observed in the data.

In other words, any value of m satisfying X[k] < m < X[k+1]

*separates* the Y=0 responses from the Y=1 responses: all the

Y=0 responses are for X-values less than m, and all the Y=1

responses are for values of X greater than m. Since such a

value of m is not unique (any value between X[k] and X[k+1]

will do), failure of convergence will be observed.

This can certainly arise from excessively "outlying" data,

i.e. the difference X[k+1] - X[k] is much greater than the

true value of the logistic scale parameter s, so that it

is almost certain that all responses for X <= X[k] will be 0,

and all responses for X >= X[k+1] will be 1.

But it can also readily arise when there are only a few data,

i.e. n is small; and also, even with many data, if the scale

paramater is n ot large compared with the spacing between

X values. If you run the following code a few times, you

will see that "separation" occurs with about 30% probability.

n <- 6

X <- (1:n) ; m <- (n+1)/2 ; s <- 2

count <- numeric(100)

for(i in (1:100)) {

Y <- 1*(runif(n) <= exp((X-m)/s)/(1+exp((X-m)/s)))

count[i] <- (max(X[Y==0]) < min(X[Y==1]))

}

sum(count)

Similarly, with n <- 10, it occurs about 20% of the time;

for n <- 15 about 15%, and it never gets much below 15%

no matter how large n is.

The more general situation of "linear separation" is where

you have many predictor variables, when there can be a high

probability that the random outcomes Y=0 label one set of

cases, the Y=1 label another set of cases, and the X-vectors

for Y=0 can be separated from the X-vectors for Y=1 by a

linear form, i.e. there is a vector of coefficients beta

such that

X[for any Y==0]%*%beta < X[for any Y==1]%*%beta

This also can arise quite readily in practice.

There is no definitive procedure for coping with this situation

when it arises. What the fitting procedure is doing is just

what it is supposed to do, namely predicting P(Y=1)=0 for

every X such that the observed Y=0, and predicting P(Y=1)=1

for every X such that the observed Y=1 whenever it can (i.e.

whenever there is separation). So indeed this is a result

of maximum-likelihood estimation in that it is seeking values

of the parameters

There are work-rounds which are variants of penalising small

values of the scale paramater s, so that what is being

maximised is not the plain likelihood L but L moderated

by the penalty so that for the smaller values of s the

penalised likelihood will be less than the plain likelihood

(and, for really small s, much less).

However, choosing a penalising strategy is not well-defined

from a theoretical point of view, and will involve an element

of arbitrary choice, i.e. a penalisation function which

appropriately reflects your objectives.

Hoping this helps,

Ted.

PS Try searching for "linear separation" and "linear separability"

in the searchable R-help archives at

http://tolstoy.newcastle.edu.au/R/(the old R Site Help and Archive Search at

http://finzi.psych.upenn.edu/is no longer of much use).

-------------------------------------------------

E-Mail: (Ted Harding) <

ted.harding@wlandres.net>

Date: 21-Mar-2012 Time: 15:35:00

This message was sent by XFMail