Dear RUsers,
How can I use chisq.test() as a goodness of fit test?
Reading manpage I?ve some doubts that kind of test is
available with this statement. Am I wrong?
X2=sum((OE)^2)/E)
O=empirical frequencies
E=expected freq. calculated with the model (such as
normal distribution)
See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.
Any help will be appreciated.
Thank a lot. Bye.
Vito
=====
Diventare costruttori di soluzioni
Became solutions' constructors
"The business of the statistician is to catalyze
the scientific learning process."
George E. P. Box
Top 10 reasons to become a Statistician
1. Deviation is considered normal
2. We feel complete and sufficient
3. We are 'mean' lovers
4. Statisticians do it discretely and continuously
5. We are right 95% of the time
6. We can legally comment on someone's posterior distribution
7. We may not be normal, but we are transformable
8. We never have to say we are certain
9. We are honestly significantly different
10. No one wants our jobs
Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/palese/
[R] chisq.test() as a goodness of fit test
Tweet 

Search Discussions

Ted Harding at Jan 13, 2005 at 6:30 pm It is not conspicuous in "?chisqu.test", though in fact it isOn 13Jan05 Vito Ricci wrote:
Dear RUsers,
How can I use chisq.test() as a goodness of fit test?
Reading manpage I've some doubts that kind of test is
available with this statement. Am I wrong?
X2=sum((OE)^2)/E)
O=empirical frequencies
E=expected freq. calculated with the model (such as
normal distribution)
See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.
the case, that chisq.test() could perform the sort of test you
are looking for. No doubt this is a result of so much space
devoted to the contingency table case.
However, if you use it in the form
chisq.test(x,p)
where x is a vector of counts in "bins" and p is a vector,
of the same length as x, of the probabilities that a random
observation will fall in the various bins, then it is that
sort of test.
So, for example, if you dissect the range of X into k intervals
[,X1], (X1,X2], ... , (X[k2],X[k1]], (X[k1],],
let N1, N2, ... , Nk be the numbers of observations in these
intervals,
let
x = c(N1,...,Nk)
p = c(pnorm(X1),
pnorm(c(X2,...,X[k1])pnorm(c(X1,...,X[k2]),
1pnorm(X[k1]) )
then
chisq.test(x,p)
will test the goodness of fit of the normal distribution.
(Note that the above is schematic pseudoR code, not real
R code!)
However, this use of chisq.test(x,p) is limited (as far
as I can see) to the case where no parameters have been
estimated in choosing the distribution from which p is
calculated, and so will be based on the wrong number
of degrees of freedom if the distribution is estimated
from the data. I cannot see any provision for specifying
either the degrees of freedom, or the number of parameters
estimated for p, in the documentation for chisq.test().
So in the latter case you are better off doing it directly.
This is not more difficult, since the hard work is in
calculating the elements of p. After that, with E=N*p,
X2 < sum(((OE)^2)/E)
has the chisquared distribution with df=(kr) d.f. where
k is the number of "bins" and r is the number of parameters
that have been estimated. So get 1pchisq(X2,df).
Best wishes,
Ted.

EMail: (Ted Harding) <ted.harding@nessie.mcc.ac.uk>
Faxtoemail: +44 (0)870 094 0861 [NB: New number!]
Date: 13Jan05 Time: 18:30:58
 XFMail 

Peter Dalgaard at Jan 13, 2005 at 8:12 pm As Achim indicated, this only works if you estimate the parameters(Ted Harding) <ted.harding@nessie.mcc.ac.uk> writes:
This is not more difficult, since the hard work is in
calculating the elements of p. After that, with E=N*p,
X2 < sum(((OE)^2)/E)
has the chisquared distribution with df=(kr) d.f. where
k is the number of "bins" and r is the number of parameters
that have been estimated. So get 1pchisq(X2,df).
from the binned data (and I suspect that you in principle need to have
decided the bins in advance too.) My old Stat1 notes had a claim that
if you used the mean and variance of unbinned data to estimate the
normal distribution, then the X2 would be between chisquares with
k2 and k1 d.f.
Incidentally, my .02 DKK is that you're more likely to want a test
against smoother alternative than the omnibus alternative implied by
the chisquare. For instance, if you have digitpreference effects in the
distribution (some weight measurements rounded to nearest half kg,
e.g.), it can throw a highly significant X2, but the deviation is of a
character that has little importance for the validity of subsequent
analyses. I haven't ever seen any of those for the case of estimated
parameters, though...

O__  Peter Dalgaard Blegdamsvej 3
c/ /'_  Dept. of Biostatistics 2200 Cph. N
(*) \(*)  University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~  (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 
Achim Zeileis at Jan 13, 2005 at 6:39 pm You can doOn Thu, 13 Jan 2005 18:23:37 +0100 (CET) Vito Ricci wrote:
Dear RUsers,
How can I use chisq.test() as a goodness of fit test?
Reading manpage I?ve some doubts that kind of test is
available with this statement. Am I wrong?
X2=sum((OE)^2)/E)
O=empirical frequencies
E=expected freq.
chisq.test(O, p = E/sum(E))
but note that this assumes that the expected frequencies/probabilities
are known (and not estimated).calculated with the model (such as normal distribution)"Normal distribution" is not a fully specified model! If you estimate
the parameters by ML, the inference will typically not be valid. Another
approach would be to estimate the parameters by grouped ML or minimum
Chisquared instead. See also ?pearson.test from package nortest and the
references therein.
For discrete distributions, this Chisquared statistic is more natural
(though not always without problems): see ?goodfit in package vcd.
ZSee:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.
Any help will be appreciated.
Thank a lot. Bye.
Vito
=====
Diventare costruttori di soluzioni
Became solutions' constructors
"The business of the statistician is to catalyze
the scientific learning process."
George E. P. Box
Top 10 reasons to become a Statistician
1. Deviation is considered normal
2. We feel complete and sufficient
3. We are 'mean' lovers
4. Statisticians do it discretely and continuously
5. We are right 95% of the time
6. We can legally comment on someone's posterior distribution
7. We may not be normal, but we are transformable
8. We never have to say we are certain
9. We are honestly significantly different
10. No one wants our jobs
Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese
http://www.modugno.it/archivio/palese/
______________________________________________
Rhelp at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/rhelp
PLEASE do read the posting guide!
http://www.Rproject.org/postingguide.html
Related Discussions
Discussion Navigation
view  thread  post 
Discussion Overview
group  rhelp 
categories  r 
posted  Jan 13, '05 at 5:23p 
active  Jan 13, '05 at 8:12p 
posts  4 
users  4 
website  rproject.org 
irc  #r 