Grokbase Groups R r-help January 2005
FAQ

[R] chisq.test() as a goodness of fit test

Vito Ricci
Jan 13, 2005 at 5:23 pm
Dear R-Users,

How can I use chisq.test() as a goodness of fit test?
Reading man-page I?ve some doubts that kind of test is
available with this statement. Am I wrong?


X2=sum((O-E)^2)/E)

O=empirical frequencies
E=expected freq. calculated with the model (such as
normal distribution)

See:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
for X2 used as a goodness of fit test.

Any help will be appreciated.
Thank a lot. Bye.
Vito


=====
Diventare costruttori di soluzioni
Became solutions' constructors

"The business of the statistician is to catalyze
the scientific learning process."
George E. P. Box

Top 10 reasons to become a Statistician

1. Deviation is considered normal
2. We feel complete and sufficient
3. We are 'mean' lovers
4. Statisticians do it discretely and continuously
5. We are right 95% of the time
6. We can legally comment on someone's posterior distribution
7. We may not be normal, but we are transformable
8. We never have to say we are certain
9. We are honestly significantly different
10. No one wants our jobs


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/palese/
reply

Search Discussions

3 responses

  • Ted Harding at Jan 13, 2005 at 6:30 pm

    On 13-Jan-05 Vito Ricci wrote:
    Dear R-Users,

    How can I use chisq.test() as a goodness of fit test?
    Reading man-page I've some doubts that kind of test is
    available with this statement. Am I wrong?


    X2=sum((O-E)^2)/E)

    O=empirical frequencies
    E=expected freq. calculated with the model (such as
    normal distribution)

    See:
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
    for X2 used as a goodness of fit test.
    It is not conspicuous in "?chisqu.test", though in fact it is
    the case, that chisq.test() could perform the sort of test you
    are looking for. No doubt this is a result of so much space
    devoted to the contingency table case.

    However, if you use it in the form

    chisq.test(x,p)

    where x is a vector of counts in "bins" and p is a vector,
    of the same length as x, of the probabilities that a random
    observation will fall in the various bins, then it is that
    sort of test.

    So, for example, if you dissect the range of X into k intervals
    [,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],],
    let N1, N2, ... , Nk be the numbers of observations in these
    intervals,
    let

    x = c(N1,...,Nk)

    p = c(pnorm(X1),
    pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]),
    1-pnorm(X[k-1]) )

    then

    chisq.test(x,p)

    will test the goodness of fit of the normal distribution.
    (Note that the above is schematic pseudo-R code, not real
    R code!)

    However, this use of chisq.test(x,p) is limited (as far
    as I can see) to the case where no parameters have been
    estimated in choosing the distribution from which p is
    calculated, and so will be based on the wrong number
    of degrees of freedom if the distribution is estimated
    from the data. I cannot see any provision for specifying
    either the degrees of freedom, or the number of parameters
    estimated for p, in the documentation for chisq.test().

    So in the latter case you are better off doing it directly.
    This is not more difficult, since the hard work is in
    calculating the elements of p. After that, with E=N*p,

    X2 <- sum(((O-E)^2)/E)

    has the chi-squared distribution with df=(k-r) d.f. where
    k is the number of "bins" and r is the number of parameters
    that have been estimated. So get 1-pchisq(X2,df).

    Best wishes,
    Ted.


    --------------------------------------------------------------------
    E-Mail: (Ted Harding) <ted...@...uk>
    Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
    Date: 13-Jan-05 Time: 18:30:58
    ------------------------------ XFMail ------------------------------
  • Peter Dalgaard at Jan 13, 2005 at 8:12 pm

    (Ted Harding) <ted...@...uk> writes:

    This is not more difficult, since the hard work is in
    calculating the elements of p. After that, with E=N*p,

    X2 <- sum(((O-E)^2)/E)

    has the chi-squared distribution with df=(k-r) d.f. where
    k is the number of "bins" and r is the number of parameters
    that have been estimated. So get 1-pchisq(X2,df).
    As Achim indicated, this only works if you estimate the parameters
    from the binned data (and I suspect that you in principle need to have
    decided the bins in advance too.) My old Stat-1 notes had a claim that
    if you used the mean and variance of unbinned data to estimate the
    normal distribution, then the X2 would be between chi-squares with
    k-2 and k-1 d.f.

    Incidentally, my .02 DKK is that you're more likely to want a test
    against smoother alternative than the omnibus alternative implied by
    the chi-square. For instance, if you have digit-preference effects in the
    distribution (some weight measurements rounded to nearest half kg,
    e.g.), it can throw a highly significant X2, but the deviation is of a
    character that has little importance for the validity of subsequent
    analyses. I haven't ever seen any of those for the case of estimated
    parameters, though...

    --
    O__ ---- Peter Dalgaard Blegdamsvej 3
    c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
    (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
    ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
  • Achim Zeileis at Jan 13, 2005 at 6:39 pm

    On Thu, 13 Jan 2005 18:23:37 +0100 (CET) Vito Ricci wrote:

    Dear R-Users,

    How can I use chisq.test() as a goodness of fit test?
    Reading man-page I?ve some doubts that kind of test is
    available with this statement. Am I wrong?

    X2=sum((O-E)^2)/E)

    O=empirical frequencies
    E=expected freq.
    You can do
    chisq.test(O, p = E/sum(E))
    but note that this assumes that the expected frequencies/probabilities
    are known (and not estimated).
    calculated with the model (such as normal distribution)
    "Normal distribution" is not a fully specified model! If you estimate
    the parameters by ML, the inference will typically not be valid. Another
    approach would be to estimate the parameters by grouped ML or minimum
    Chi-squared instead. See also ?pearson.test from package nortest and the
    references therein.
    For discrete distributions, this Chi-squared statistic is more natural
    (though not always without problems): see ?goodfit in package vcd.
    Z

    See:
    http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
    for X2 used as a goodness of fit test.

    Any help will be appreciated.
    Thank a lot. Bye.
    Vito


    =====
    Diventare costruttori di soluzioni
    Became solutions' constructors

    "The business of the statistician is to catalyze
    the scientific learning process."
    George E. P. Box

    Top 10 reasons to become a Statistician

    1. Deviation is considered normal
    2. We feel complete and sufficient
    3. We are 'mean' lovers
    4. Statisticians do it discretely and continuously
    5. We are right 95% of the time
    6. We can legally comment on someone's posterior distribution
    7. We may not be normal, but we are transformable
    8. We never have to say we are certain
    9. We are honestly significantly different
    10. No one wants our jobs


    Visitate il portale http://www.modugno.it/
    e in particolare la sezione su Palese
    http://www.modugno.it/archivio/palese/

    ______________________________________________
    R-help at stat.math.ethz.ch mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide!
    http://www.R-project.org/posting-guide.html

Related Discussions

Discussion Navigation
viewthread | post