FAQ
Dear R-Users,

How can I use chisq.test() as a goodness of fit test?
Reading man-page I?ve some doubts that kind of test is
available with this statement. Am I wrong?

X2=sum((O-E)^2)/E)

O=empirical frequencies
E=expected freq. calculated with the model (such as
normal distribution)

See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.

Any help will be appreciated.
Thank a lot. Bye.
Vito

=====
Diventare costruttori di soluzioni
Became solutions' constructors

"The business of the statistician is to catalyze
the scientific learning process."
George E. P. Box

Top 10 reasons to become a Statistician

1. Deviation is considered normal
2. We feel complete and sufficient
3. We are 'mean' lovers
4. Statisticians do it discretely and continuously
5. We are right 95% of the time
6. We can legally comment on someone's posterior distribution
7. We may not be normal, but we are transformable
8. We never have to say we are certain
9. We are honestly significantly different
10. No one wants our jobs

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/palese/

## Search Discussions

•  at Jan 13, 2005 at 6:30 pm ⇧

On 13-Jan-05 Vito Ricci wrote:
Dear R-Users,

How can I use chisq.test() as a goodness of fit test?
Reading man-page I've some doubts that kind of test is
available with this statement. Am I wrong?

X2=sum((O-E)^2)/E)

O=empirical frequencies
E=expected freq. calculated with the model (such as
normal distribution)

See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.
It is not conspicuous in "?chisqu.test", though in fact it is
the case, that chisq.test() could perform the sort of test you
are looking for. No doubt this is a result of so much space
devoted to the contingency table case.

However, if you use it in the form

chisq.test(x,p)

where x is a vector of counts in "bins" and p is a vector,
of the same length as x, of the probabilities that a random
observation will fall in the various bins, then it is that
sort of test.

So, for example, if you dissect the range of X into k intervals
[,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],],
let N1, N2, ... , Nk be the numbers of observations in these
intervals,
let

x = c(N1,...,Nk)

p = c(pnorm(X1),
pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]),
1-pnorm(X[k-1]) )

then

chisq.test(x,p)

will test the goodness of fit of the normal distribution.
(Note that the above is schematic pseudo-R code, not real
R code!)

However, this use of chisq.test(x,p) is limited (as far
as I can see) to the case where no parameters have been
estimated in choosing the distribution from which p is
calculated, and so will be based on the wrong number
of degrees of freedom if the distribution is estimated
from the data. I cannot see any provision for specifying
either the degrees of freedom, or the number of parameters
estimated for p, in the documentation for chisq.test().

So in the latter case you are better off doing it directly.
This is not more difficult, since the hard work is in
calculating the elements of p. After that, with E=N*p,

X2 <- sum(((O-E)^2)/E)

has the chi-squared distribution with df=(k-r) d.f. where
k is the number of "bins" and r is the number of parameters
that have been estimated. So get 1-pchisq(X2,df).

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 13-Jan-05 Time: 18:30:58
------------------------------ XFMail ------------------------------
•  at Jan 13, 2005 at 8:12 pm ⇧

(Ted Harding) <ted.harding@nessie.mcc.ac.uk> writes:

This is not more difficult, since the hard work is in
calculating the elements of p. After that, with E=N*p,

X2 <- sum(((O-E)^2)/E)

has the chi-squared distribution with df=(k-r) d.f. where
k is the number of "bins" and r is the number of parameters
that have been estimated. So get 1-pchisq(X2,df).
As Achim indicated, this only works if you estimate the parameters
from the binned data (and I suspect that you in principle need to have
decided the bins in advance too.) My old Stat-1 notes had a claim that
if you used the mean and variance of unbinned data to estimate the
normal distribution, then the X2 would be between chi-squares with
k-2 and k-1 d.f.

Incidentally, my .02 DKK is that you're more likely to want a test
against smoother alternative than the omnibus alternative implied by
the chi-square. For instance, if you have digit-preference effects in the
distribution (some weight measurements rounded to nearest half kg,
e.g.), it can throw a highly significant X2, but the deviation is of a
character that has little importance for the validity of subsequent
analyses. I haven't ever seen any of those for the case of estimated
parameters, though...

--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
•  at Jan 13, 2005 at 6:39 pm ⇧

On Thu, 13 Jan 2005 18:23:37 +0100 (CET) Vito Ricci wrote:

Dear R-Users,

How can I use chisq.test() as a goodness of fit test?
Reading man-page I?ve some doubts that kind of test is
available with this statement. Am I wrong?

X2=sum((O-E)^2)/E)

O=empirical frequencies
E=expected freq.
You can do
chisq.test(O, p = E/sum(E))
but note that this assumes that the expected frequencies/probabilities
are known (and not estimated).
calculated with the model (such as normal distribution)
"Normal distribution" is not a fully specified model! If you estimate
the parameters by ML, the inference will typically not be valid. Another
approach would be to estimate the parameters by grouped ML or minimum
references therein.
For discrete distributions, this Chi-squared statistic is more natural
(though not always without problems): see ?goodfit in package vcd.
Z

See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.

Any help will be appreciated.
Thank a lot. Bye.
Vito

=====
Diventare costruttori di soluzioni
Became solutions' constructors

"The business of the statistician is to catalyze
the scientific learning process."
George E. P. Box

Top 10 reasons to become a Statistician

1. Deviation is considered normal
2. We feel complete and sufficient
3. We are 'mean' lovers
4. Statisticians do it discretely and continuously
5. We are right 95% of the time
6. We can legally comment on someone's posterior distribution
7. We may not be normal, but we are transformable
8. We never have to say we are certain
9. We are honestly significantly different
10. No one wants our jobs

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese
http://www.modugno.it/archivio/palese/

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
http://www.R-project.org/posting-guide.html

## Related Discussions

Discussion Overview
 group r-help categories r posted Jan 13, '05 at 5:23p active Jan 13, '05 at 8:12p posts 4 users 4 website r-project.org irc #r

### 4 users in discussion

Content

People

Support

Translate

site design / logo © 2018 Grokbase