On 13-Jan-05 Vito Ricci wrote:Dear R-Users,

How can I use chisq.test() as a goodness of fit test?

Reading man-page I've some doubts that kind of test is

available with this statement. Am I wrong?

X2=sum((O-E)^2)/E)

O=empirical frequencies

E=expected freq. calculated with the model (such as

normal distribution)

See:

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htmfor X2 used as a goodness of fit test.

It is not conspicuous in "?chisqu.test", though in fact it is

the case, that chisq.test() could perform the sort of test you

are looking for. No doubt this is a result of so much space

devoted to the contingency table case.

However, if you use it in the form

chisq.test(x,p)

where x is a vector of counts in "bins" and p is a vector,

of the same length as x, of the probabilities that a random

observation will fall in the various bins, then it is that

sort of test.

So, for example, if you dissect the range of X into k intervals

[,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],],

let N1, N2, ... , Nk be the numbers of observations in these

intervals,

let

x = c(N1,...,Nk)

p = c(pnorm(X1),

pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]),

1-pnorm(X[k-1]) )

then

chisq.test(x,p)

will test the goodness of fit of the normal distribution.

(Note that the above is schematic pseudo-R code, not real

R code!)

However, this use of chisq.test(x,p) is limited (as far

as I can see) to the case where no parameters have been

estimated in choosing the distribution from which p is

calculated, and so will be based on the wrong number

of degrees of freedom if the distribution is estimated

from the data. I cannot see any provision for specifying

either the degrees of freedom, or the number of parameters

estimated for p, in the documentation for chisq.test().

So in the latter case you are better off doing it directly.

This is not more difficult, since the hard work is in

calculating the elements of p. After that, with E=N*p,

X2 <- sum(((O-E)^2)/E)

has the chi-squared distribution with df=(k-r) d.f. where

k is the number of "bins" and r is the number of parameters

that have been estimated. So get 1-pchisq(X2,df).

Best wishes,

Ted.

--------------------------------------------------------------------

E-Mail: (Ted Harding) <

ted.harding@nessie.mcc.ac.uk>

Fax-to-email: +44 (0)870 094 0861 [NB: New number!]

Date: 13-Jan-05 Time: 18:30:58

------------------------------ XFMail ------------------------------