Grokbase Groups R r-help July 2006
FAQ
Hello. Douglas Bates has explained in a previous posting to R why he does
not output residual degrees of freedom, F values and probabilities in the
mixed model (lmer) function: because the usual degrees of freedom (obs -
fixed df -1) are not exact and are really only upper bounds. I am
interpreting what he said but I am not a professional statistician, so I
might be getting this wrong...
Does anyone know of any more recent results, perhaps from simulations, that
quantify the degree of bias that using such upper bounds for the demoninator
degrees of freedom produces? Is it possible to calculate a lower bounds for
such degrees of freedom?

Thanks for any help.

Bill Shipley
North American Editor, Annals of Botany
Editor, "Population and Community Biology" series, Springer Publishing
D?partement de biologie, Universit? de Sherbrooke,
Sherbrooke (Qu?bec) J1K 2R1 CANADA
Bill.Shipley at USherbrooke.ca
http://pages.usherbrooke.ca/jshipley/recherche/

Search Discussions

  • Douglas Bates at Jul 28, 2006 at 1:48 pm

    On 7/26/06, Bill Shipley wrote:

    Hello. Douglas Bates has explained in a previous posting to R why he does
    not output residual degrees of freedom, F values and probabilities in the
    mixed model (lmer) function: because the usual degrees of freedom (obs -
    fixed df -1) are not exact and are really only upper bounds. I am
    interpreting what he said but I am not a professional statistician, so I
    might be getting this wrong...
    Does anyone know of any more recent results, perhaps from simulations, that
    quantify the degree of bias that using such upper bounds for the demoninator
    degrees of freedom produces? Is it possible to calculate a lower bounds for
    such degrees of freedom?
    I have not seen any responses to your request yet Bill. I was hoping
    that others might offer their opinions and provide some new
    perspectives on this issue. However, it looks as if you will be stuck
    with my responses for the time being.

    You have phrased your question in terms of the denominator degrees of
    freedom associated with terms in the fixed-effects specification and,
    indeed, this is the way the problem is usually addressed. However,
    that is jumping ahead two or three steps from the iniital problem
    which is how to perform an hypothesis test comparing two nested models
    - a null model without the term in question and the alternative model
    including this term.

    If we assume that the F statistic is a reasonable way of evaluating
    this hypothesis test and that the test statistic will have an F
    distribution with a known numerator degrees of freedom and an unknown
    denominator degrees of freedom then we can reduce the problem of
    testing the hypothesis to one of approximating the denominator degrees
    of freedom. However, there is a lot of assumption going on in that
    argument. These assumptions may be warranted or they may not.

    As far as I can see, the usual argument made for those assumptions is
    by analogy. If we had a balanced design and if we used error strata
    to get expected and observed mean squares and if we equated expected
    and observed mean squares to obtain estimates of variance components
    then the test for a given term in the fixed effects specification
    would have a certain form. Even though we are not doing any of these
    things when estimating variance components by maximum likelihood or by
    REML, the argument is that the test for a fixed effects term should
    end up with the same form. I find that argument to be a bit of a
    stretch.

    Because the results from software such as SAS PROC MIXED are based on
    this type of argument many people assume that it is a well-established
    result that the test should be conducted in this way. Current
    versions of PROC MIXED allow for several different ways of calculating
    denominator degrees of freedom, including at least one, the
    Kenward-Roger method, that uses two tuning parameters - denominator
    degrees of freedom and a scale factor.

    Some simulation studies have been performed comparing the methods in
    SAS PROC MIXED and other simulation studies are planned but for me
    this is all putting the cart before the horse. There is no answer to
    the question "what is the _correct_ denominator degrees of freedom for
    this test statistic" if the test statistic doesn't have a F
    distribution with a known numerator degrees of freedom and an unknown
    denominator degrees of freedom.

    I don't think there is a perfect answer to this question. I like the
    approach using Markov chain Monte Carlo samples from the posterior
    distribution of the parameters because it allows me to assess the
    distribution of the parameters and it takes into account the full
    range of the variation in the parameters (the F-test approach is
    conditional on estimates of the variance components). However, it
    does not produce a nice cryptic p-value for publication.

    I understand the desire for a definitive answer that can be used in a
    publication. However, I am not satisfied with any of the "definitive
    answers" that are out there and I would rather not produce an answer
    than produce an answer that I don't believe in.
  • Douglas Bates at Jul 28, 2006 at 4:48 pm

    On 7/26/06, Bill Shipley wrote:
    Hello. Douglas Bates has explained in a previous posting to R why he does
    not output residual degrees of freedom, F values and probabilities in the
    mixed model (lmer) function: because the usual degrees of freedom (obs -
    fixed df -1) are not exact and are really only upper bounds. I am
    interpreting what he said but I am not a professional statistician, so I
    might be getting this wrong...
    Does anyone know of any more recent results, perhaps from simulations, that
    quantify the degree of bias that using such upper bounds for the demoninator
    degrees of freedom produces? Is it possible to calculate a lower bounds for
    such degrees of freedom?
    I can give another perspective on the issue of degrees of freedom for
    a linear mixed model although it probably doesn't address the question
    that you want to address.

    The linear predictor in a mixed model has the form X\beta + Zb where
    \beta is the fixed-effects vector and b is the random-effects vector.
    The fitted values, y-hat, are the fitted values from a penalized least
    squares fit of the response vector, y, to this linear predictor
    subject to a penalty on b defined by the variance components. When
    the penalty is large, the fitted values approach those from the
    ordinary least squares fit of y on X\beta only. When the penalty is
    small, the fitted values approach those from an unpenalized least
    squares fit of y on the linear predictor. (In this case estimates of
    the coefficients are not well defined because the combined matrix
    [X:Z] is generally rank deficient but the fitted values are well
    defined.)

    If the rank of X is p and the rank of [X:Z] is r then the effective
    number of parameters in the linear predictor for the penalized least
    squares fit is somewhere between p and r. One way of defining the
    effective number of parameters is as the trace of the hat matrix for
    the penalized least squares problem. This number will change as the
    variance components change and is usually evaluated at the estimates
    of the variance components. This is exactly what Spiegelhalter, Best,
    Carlin and van der Linde (JRSSB, 64(4), 583-639, 2002) define to be
    their p_D for this model. The next release of the lme4/Matrix
    packages will include an extractor function to evaluate the trace of
    the hat matrix for a fitted lmer model, using an algorithm due to
    Jialiang Li.

    This effective number of parameters in the linear predictor is like
    the degrees of freedom for regression. In the limiting cases it is
    exactly the degrees of freedom for regression. One might then argue
    that the degrees of freedom for residuals would be n - hat trace and
    use this for the denominator degrees of freedom in the F ratios.
    However, this number does not vary with the numerator term and many
    people will claim that it must. (I admit to being a bit perplexed as
    to why the denominator degrees of freedom should change according to
    the numerator term when the denominator of the F ratio itself doesn't
    change, but many people insist that this is the way things must be.)

    So it is possible to calculate a number that can reasonably be
    considered to be the
    degrees of freedom for the denominator that is actually used in the F
    ratios but this will not correspond to what many people will insist is
    the "obviously correct" number of degrees of freedom.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJul 26, '06 at 7:01p
activeJul 28, '06 at 4:48p
posts3
users2
websiter-project.org
irc#r

2 users in discussion

Douglas Bates: 2 posts Bill Shipley: 1 post

People

Translate

site design / logo © 2022 Grokbase