FAQ
Hi all,

I am doing bootstrap with logistic regression by using glm function, and I
get the errors;

glm.fit: fitted probabilities numerically 0 or 1 occurred and
glm.fit: algorithm did not converge

I have read some things about this issue in the mailing list. I can guess
what was the problem. My data contains one or may be two outliers. Does the
error occur due to these extreme values or something else such as MLE? Is
there any way to to fix this problem?

Regards,

Ufuk

-----
Best regards

Ufuk
--
View this message in context: http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-tp4490722p4490722.html
Sent from the R help mailing list archive at Nabble.com.

## Search Discussions

• at Mar 21, 2012 at 11:30 am ⇧ I get the errors;
glm.fit: fitted probabilities numerically 0 or 1 occurred and
glm.fit: algorithm did not converge
.....
Is there any way to to
fix this problem?
There are two separate issues.
One is the appearance of fitted values at 0 or 1.
The other is the lack of convergence.

The first is usually not fatal; it means that the probabilities are so close to 0 or 1 that a double precision value can't distinguish them from 0 or 1. Often that's a transient condition during iteration and the final fitted values are inside (0,1), but final fitted values can also be out there if you have continuous predictor values a long way out; by itself, that usually won't stop a glm.

The second is a bit more problematic. Sometimes it's just that you need to increase the maximum number of iterations (see the control= argument and ?glm.control). That is always worth a try - use some absurdly high number like 1000 instead of the default 25 and go find some coffee while it thinks about it. If that solves your problem then you're OK, or at least as OK as you can be with a data set that hard to fit.
But if you're bootstrapping with some anomalous values it is also possible that some of your bootstrapped sets have too high a proportion of anomalies, and under those conditions it's possible that there could be no sensible optimum within reach.

One way of dealing with that in a boostrap or other simulation context is to check the 'converged' value and if it's FALSE, return an NA for your statistic. But of course that is a form of censoring; if you have a high proportion of such instances you'd be on very thin ice drawing conclusions.

S Ellison

*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
• at Mar 21, 2012 at 2:22 pm ⇧ Dear Ellison,

The information you typed is clear and now I know what to do. Your
suggestion about finding some coffee while running simulation is so good =)

Regards

-----
Best regards

Ufuk
--
View this message in context: http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-tp4490722p4492436.html
Sent from the R help mailing list archive at Nabble.com.
• at Mar 21, 2012 at 3:35 pm ⇧ On 21-Mar-2012 S Ellison wrote:
I get the errors;
glm.fit: fitted probabilities numerically 0 or 1 occurred and
glm.fit: algorithm did not converge
.....
Is there any way to to
fix this problem?
There are two separate issues.
One is the appearance of fitted values at 0 or 1.
The other is the lack of convergence.

The first is usually not fatal; it means that the probabilities
are so close to 0 or 1 that a double precision value can't
distinguish them from 0 or 1.
Often that's a transient condition during iteration and the
final fitted values are inside (0,1), but final fitted values
can also be out there if you have continuous predictor values
a long way out; by itself, that usually won't stop a glm.

The second is a bit more problematic. Sometimes it's just that
you need to increase the maximum number of iterations (see the
control= argument and ?glm.control). That is always worth a try
- use some absurdly high number like 1000 instead of the default
25 and go find some coffee while it thinks about it. If that
solves your problem then you're OK, or at least as OK as you can
be with a data set that hard to fit.
But if you're bootstrapping with some anomalous values it is
also possible that some of your bootstrapped sets have too high
a proportion of anomalies, and under those conditions it's
possible that there could be no sensible optimum within reach.

One way of dealing with that in a boostrap or other simulation
context is to check the 'converged' value and if it's FALSE,
return an NA for your statistic. But of course that is a form
of censoring; if you have a high proportion of such instances
you'd be on very thin ice drawing conclusions.

S Ellison
There is a further general issue. The occurence of

glm.fit: fitted probabilities numerically 0 or 1 occurred and
glm.fit: algorithm did not converge

is usually a symptom of what is called "linear separation".
The simplest instance of this is when there is a single
predictor variable X whose values are, say, ordered as
X < X < ... < X[n-1] < X[n].

If it should happen that the values Y ... Y[n] of the 0/1
response variable Y are all equal to 0 for X,...,X[k]
and all equal to 1 for X[k+1],...,X[n], and you are fitting
a glm(Y~X,family=binomial) (which is what you use for logistic
regression), then the formula being fitted is

log(P(Y=1|X)/(Y=0|X)) = (X - m)/s for some values of m and s.

In the circumstances described, for any value of m between

X[k] < m < X[k+1]

the fit can force P[Y=1|X] --> 0 for X = X,...,X[k]
and can force P[Y=1|X] --> 1 for X = X[k+1],...,X[n] by
pushing s --> 0 (so then the first set of (X - m)/s --> -Inf
and the second set of (X - m)/s --> +Inf), thus forcing the
probabilities predicted by the model towards agreeing exactly
with the frequencies observed in the data.

In other words, any value of m satisfying X[k] < m < X[k+1]
*separates* the Y=0 responses from the Y=1 responses: all the
Y=0 responses are for X-values less than m, and all the Y=1
responses are for values of X greater than m. Since such a
value of m is not unique (any value between X[k] and X[k+1]
will do), failure of convergence will be observed.

This can certainly arise from excessively "outlying" data,
i.e. the difference X[k+1] - X[k] is much greater than the
true value of the logistic scale parameter s, so that it
is almost certain that all responses for X <= X[k] will be 0,
and all responses for X >= X[k+1] will be 1.

But it can also readily arise when there are only a few data,
i.e. n is small; and also, even with many data, if the scale
paramater is n ot large compared with the spacing between
X values. If you run the following code a few times, you
will see that "separation" occurs with about 30% probability.

n <- 6
X <- (1:n) ; m <- (n+1)/2 ; s <- 2
count <- numeric(100)
for(i in (1:100)) {
Y <- 1*(runif(n) <= exp((X-m)/s)/(1+exp((X-m)/s)))
count[i] <- (max(X[Y==0]) < min(X[Y==1]))
}
sum(count)

Similarly, with n <- 10, it occurs about 20% of the time;
for n <- 15 about 15%, and it never gets much below 15%
no matter how large n is.

The more general situation of "linear separation" is where
you have many predictor variables, when there can be a high
probability that the random outcomes Y=0 label one set of
cases, the Y=1 label another set of cases, and the X-vectors
for Y=0 can be separated from the X-vectors for Y=1 by a
linear form, i.e. there is a vector of coefficients beta
such that

X[for any Y==0]%*%beta < X[for any Y==1]%*%beta

This also can arise quite readily in practice.

There is no definitive procedure for coping with this situation
when it arises. What the fitting procedure is doing is just
what it is supposed to do, namely predicting P(Y=1)=0 for
every X such that the observed Y=0, and predicting P(Y=1)=1
for every X such that the observed Y=1 whenever it can (i.e.
whenever there is separation). So indeed this is a result
of maximum-likelihood estimation in that it is seeking values
of the parameters

There are work-rounds which are variants of penalising small
values of the scale paramater s, so that what is being
maximised is not the plain likelihood L but L moderated
by the penalty so that for the smaller values of s the
penalised likelihood will be less than the plain likelihood
(and, for really small s, much less).

However, choosing a penalising strategy is not well-defined
from a theoretical point of view, and will involve an element
of arbitrary choice, i.e. a penalisation function which
appropriately reflects your objectives.

Hoping this helps,
Ted.

PS Try searching for "linear separation" and "linear separability"
in the searchable R-help archives at

http://tolstoy.newcastle.edu.au/R/

(the old R Site Help and Archive Search at http://finzi.psych.upenn.edu/
is no longer of much use).

-------------------------------------------------
E-Mail: (Ted Harding) <ted.harding@wlandres.net>
Date: 21-Mar-2012 Time: 15:35:00
This message was sent by XFMail
• at Mar 22, 2012 at 8:08 pm ⇧ Dear ted, thanks for your help. Now, everything is more clear. I read
something about "linear separation" you mentioned, and my data set is very
suitable for this problem. But, there is a confusing question for me;

I can not controll the process adequately because of usage of bootstrap. So,
this warnings will arise.
After the analyse, If I interpret the result with these warnings, does this
right? At least i may control the warnings "glm.fit: algorithm did not
converge" with gm.control, but "fitted probabilities numerically 0 or 1
occurred" will certainly arise. Does the interpretation of results with this
warning right?

Many thanks.

-----
Best regards

Ufuk
--
View this message in context: http://r.789695.n4.nabble.com/glm-fit-fitted-probabilities-numerically-0-or-1-occurred-tp4490722p4496710.html
Sent from the R help mailing list archive at Nabble.com.

## Related Discussions

 view thread | post
Discussion Overview
 group r-help categories r posted Mar 20, '12 at 11:22p active Mar 22, '12 at 8:08p posts 5 users 3 website r-project.org irc #r

### 3 users in discussion

Content

People

Support

Translate

site design / logo © 2022 Grokbase