FAQ
I posted this one as an R bug
(https://bugs.r-project.org/bugzilla3/show_bug.cgi?id767), but
Prof. Ripley says I'm premature, and I should raise the question here.

Here's the behavior I assert is a bug:
The output from delete.response on a terms object alters the formula
by removing the dependent variable. It removes the response from the
"variables" attribute and it changes the response attribute from 1 to
0. The response is removed from "predvars"

But it leaves the name of the dependent variable first in the in
"dataClasses". It caused an unexpected behavior in my code, so (as
usual) the bug may be mine, but in my heart, I believe it belongs to
delete.response.

To illustrate, here's a terms object from a regression.
tt
y ~ x1 * x2 + x3 + x4
attr(,"variables")
list(y, x1, x2, x3, x4)
attr(,"factors")
x1 x2 x3 x4 x1:x2
y 0 0 0 0 0
x1 1 0 0 0 1
x2 0 1 0 0 1
x3 0 0 1 0 0
x4 0 0 0 1 0
attr(,"term.labels")
[1] "x1" "x2" "x3" "x4" "x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(y, x1, x2, x3, x4)
attr(,"dataClasses")
y x1 x2 x3 x4
"numeric" "numeric" "numeric" "numeric" "numeric"

Now observe that delete.response removes the response from all
attributes except dataClasses.
delete.response(tt)
~x1 * x2 + x3 + x4
attr(,"variables")
list(x1, x2, x3, x4)
attr(,"factors")
x1 x2 x3 x4 x1:x2
x1 1 0 0 0 1
x2 0 1 0 0 1
x3 0 0 1 0 0
x4 0 0 0 1 0
attr(,"term.labels")
[1] "x1" "x2" "x3" "x4" "x1:x2"
attr(,"order")
[1] 1 1 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(x1, x2, x3, x4)
attr(,"dataClasses")
y x1 x2 x3 x4
"numeric" "numeric" "numeric" "numeric" "numeric"


pj

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

Search Discussions

  • William Dunlap at Jan 5, 2012 at 8:56 pm
    I had noticed the same thing but figured that most
    people (writers of predict methods) would be looking
    up entries in dataClasses by name and not by position,
    since predict's newdata argument need not have entries
    in the same order as the data used to fit the model.
    Hence the extra entry would not noticed (nor would it be
    missed if it were omitted).

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com
    -----Original Message-----
    From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Paul Johnson
    Sent: Thursday, January 05, 2012 12:27 PM
    To: R Devel List
    Subject: [Rd] delete.response leaves response in attribute dataClasses

    I posted this one as an R bug
    (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id767), but
    Prof. Ripley says I'm premature, and I should raise the question here.

    Here's the behavior I assert is a bug:
    The output from delete.response on a terms object alters the formula
    by removing the dependent variable. It removes the response from the
    "variables" attribute and it changes the response attribute from 1 to
    0. The response is removed from "predvars"

    But it leaves the name of the dependent variable first in the in
    "dataClasses". It caused an unexpected behavior in my code, so (as
    usual) the bug may be mine, but in my heart, I believe it belongs to
    delete.response.

    To illustrate, here's a terms object from a regression.
    tt
    y ~ x1 * x2 + x3 + x4
    attr(,"variables")
    list(y, x1, x2, x3, x4)
    attr(,"factors")
    x1 x2 x3 x4 x1:x2
    y 0 0 0 0 0
    x1 1 0 0 0 1
    x2 0 1 0 0 1
    x3 0 0 1 0 0
    x4 0 0 0 1 0
    attr(,"term.labels")
    [1] "x1" "x2" "x3" "x4" "x1:x2"
    attr(,"order")
    [1] 1 1 1 1 2
    attr(,"intercept")
    [1] 1
    attr(,"response")
    [1] 1
    attr(,".Environment")
    <environment: R_GlobalEnv>
    attr(,"predvars")
    list(y, x1, x2, x3, x4)
    attr(,"dataClasses")
    y x1 x2 x3 x4
    "numeric" "numeric" "numeric" "numeric" "numeric"

    Now observe that delete.response removes the response from all
    attributes except dataClasses.
    delete.response(tt)
    ~x1 * x2 + x3 + x4
    attr(,"variables")
    list(x1, x2, x3, x4)
    attr(,"factors")
    x1 x2 x3 x4 x1:x2
    x1 1 0 0 0 1
    x2 0 1 0 0 1
    x3 0 0 1 0 0
    x4 0 0 0 1 0
    attr(,"term.labels")
    [1] "x1" "x2" "x3" "x4" "x1:x2"
    attr(,"order")
    [1] 1 1 1 1 2
    attr(,"intercept")
    [1] 1
    attr(,"response")
    [1] 0
    attr(,".Environment")
    <environment: R_GlobalEnv>
    attr(,"predvars")
    list(x1, x2, x3, x4)
    attr(,"dataClasses")
    y x1 x2 x3 x4
    "numeric" "numeric" "numeric" "numeric" "numeric"


    pj

    --
    Paul E. Johnson
    Professor, Political Science
    1541 Lilac Lane, Room 504
    University of Kansas

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • William Dunlap at Jan 5, 2012 at 9:15 pm
    My feeling that everyone would index dataClasses by name was
    wrong. I looked through the packages that used dataClasses
    and saw code that would break if the first (response) entry
    were omitted. (I didn't check to see if passing the output
    of delete.response to these functions would be appropriate.)
    E.g.,
    file: AICcmodavg/R/predictSE.mer.r
    ##matrix with info on factors
    fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]

    ##continue if factors
    if(any(fact.frame == "factor")) {
    id.factors <- which(fact.frame == "factor")
    fact.name <- names(fact.frame)[id.factors] #identify the rows for factors

    Some packages create a dataClass attribute for a model.frame
    (not its terms attribute) that does not have any names:
    file: caper/R/macrocaic.R
    attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
    .checkMFClasses() does not throw an error for that, but it
    doesn't do any real checking either.

    Most users of dataClasses do pass it to .checkMFClasses() to
    compare it with newdata and that doesn't care if you have extra
    entries in dataClasses.

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com
    -----Original Message-----
    From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of William Dunlap
    Sent: Thursday, January 05, 2012 12:57 PM
    To: Paul Johnson; R Devel List
    Subject: Re: [Rd] delete.response leaves response in attribute dataClasses

    I had noticed the same thing but figured that most
    people (writers of predict methods) would be looking
    up entries in dataClasses by name and not by position,
    since predict's newdata argument need not have entries
    in the same order as the data used to fit the model.
    Hence the extra entry would not noticed (nor would it be
    missed if it were omitted).

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com
    -----Original Message-----
    From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Paul Johnson
    Sent: Thursday, January 05, 2012 12:27 PM
    To: R Devel List
    Subject: [Rd] delete.response leaves response in attribute dataClasses

    I posted this one as an R bug
    (https://bugs.r-project.org/bugzilla3/show_bug.cgi?id767), but
    Prof. Ripley says I'm premature, and I should raise the question here.

    Here's the behavior I assert is a bug:
    The output from delete.response on a terms object alters the formula
    by removing the dependent variable. It removes the response from the
    "variables" attribute and it changes the response attribute from 1 to
    0. The response is removed from "predvars"

    But it leaves the name of the dependent variable first in the in
    "dataClasses". It caused an unexpected behavior in my code, so (as
    usual) the bug may be mine, but in my heart, I believe it belongs to
    delete.response.

    To illustrate, here's a terms object from a regression.
    tt
    y ~ x1 * x2 + x3 + x4
    attr(,"variables")
    list(y, x1, x2, x3, x4)
    attr(,"factors")
    x1 x2 x3 x4 x1:x2
    y 0 0 0 0 0
    x1 1 0 0 0 1
    x2 0 1 0 0 1
    x3 0 0 1 0 0
    x4 0 0 0 1 0
    attr(,"term.labels")
    [1] "x1" "x2" "x3" "x4" "x1:x2"
    attr(,"order")
    [1] 1 1 1 1 2
    attr(,"intercept")
    [1] 1
    attr(,"response")
    [1] 1
    attr(,".Environment")
    <environment: R_GlobalEnv>
    attr(,"predvars")
    list(y, x1, x2, x3, x4)
    attr(,"dataClasses")
    y x1 x2 x3 x4
    "numeric" "numeric" "numeric" "numeric" "numeric"

    Now observe that delete.response removes the response from all
    attributes except dataClasses.
    delete.response(tt)
    ~x1 * x2 + x3 + x4
    attr(,"variables")
    list(x1, x2, x3, x4)
    attr(,"factors")
    x1 x2 x3 x4 x1:x2
    x1 1 0 0 0 1
    x2 0 1 0 0 1
    x3 0 0 1 0 0
    x4 0 0 0 1 0
    attr(,"term.labels")
    [1] "x1" "x2" "x3" "x4" "x1:x2"
    attr(,"order")
    [1] 1 1 1 1 2
    attr(,"intercept")
    [1] 1
    attr(,"response")
    [1] 0
    attr(,".Environment")
    <environment: R_GlobalEnv>
    attr(,"predvars")
    list(x1, x2, x3, x4)
    attr(,"dataClasses")
    y x1 x2 x3 x4
    "numeric" "numeric" "numeric" "numeric" "numeric"


    pj

    --
    Paul E. Johnson
    Professor, Political Science
    1541 Lilac Lane, Room 504
    University of Kansas

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
  • Paul Johnson at Jan 6, 2012 at 7:17 pm
    Thanks, Bill

    Counter-arguments at the end
    On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap wrote:
    My feeling that everyone would index dataClasses by name was
    wrong. ?I looked through the packages that used dataClasses
    and saw code that would break if the first (response) entry
    were omitted. ?(I didn't check to see if passing the output
    of delete.response to these functions would be appropriate.)
    E.g.,
    file: AICcmodavg/R/predictSE.mer.r
    ?##matrix with info on factors
    ?fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]

    ?##continue if factors
    ?if(any(fact.frame == "factor")) {
    ? ?id.factors <- which(fact.frame == "factor")
    ? ?fact.name <- names(fact.frame)[id.factors] #identify the rows for factors

    Some packages create a dataClass attribute for a model.frame
    (not its terms attribute) that does not have any names:
    file: caper/R/macrocaic.R
    ? attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
    .checkMFClasses() does not throw an error for that, but it
    doesn't do any real checking either.

    Most users of dataClasses do pass it to .checkMFClasses() to
    compare it with newdata and that doesn't care if you have extra
    entries in dataClasses.

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com
    I can't understand what your point is. I agree we can work around the
    problem, but why should we have to?

    If you confine yourself to the output of "delete.response" applied to
    a terms object from a regression, can you point to any package or
    usage that depends on leaving the response variable in the dataClasses
    attribute? I can't find one. In R base, these are all the references
    to delete.response:

    stats/R/models.R:delete.response <- function (termobj)
    stats/R/lm.R: Terms <- delete.response(tt)
    stats/R/lm.R: Terms <- delete.response(tt)
    stats/R/ppr.R: Terms <- delete.response(object$terms)
    stats/R/loess.R:
    as.matrix(model.frame(delete.response(terms(object)), newdata,
    stats/R/dummy.coef.R: Terms <- delete.response(Terms)

    I've looked it over carefully and predict.lm (in lm.R) would not be
    affected by the change I propose. I can't find any usage in loess.R of
    the dataClasses attribute.

    Furthermore, I can't see how a person would use the dataClasses
    attribute at all, after the other markers of the response are
    eliminated. How is a method to find which variable is the response,
    after response=0?

    I'm not disagreeing with you that I can workaround the peculiarity
    that the response is left in the dataClasses attribute of the output
    object from delete.response. I'm just saying it is a complication
    that programmers should not have to put up with, because I think
    delete.response should delete the response from all attributes of a
    terms object.

    pj


    --
    Paul E. Johnson
    Professor, Political Science
    1541 Lilac Lane, Room 504
    University of Kansas
  • William Dunlap at Jan 6, 2012 at 8:23 pm

    -----Original Message-----
    From: Paul Johnson [mailto:pauljohn32 at gmail.com]
    Sent: Friday, January 06, 2012 11:17 AM
    To: William Dunlap
    Cc: R Devel List
    Subject: Re: [Rd] delete.response leaves response in attribute dataClasses

    Thanks, Bill

    Counter-arguments at the end
    On Thu, Jan 5, 2012 at 3:15 PM, William Dunlap wrote:
    My feeling that everyone would index dataClasses by name was
    wrong. ?I looked through the packages that used dataClasses
    and saw code that would break if the first (response) entry
    were omitted. ?(I didn't check to see if passing the output
    of delete.response to these functions would be appropriate.)
    E.g.,
    file: AICcmodavg/R/predictSE.mer.r
    ?##matrix with info on factors
    ?fact.frame <- attr(attr(orig.frame, "terms"), "dataClasses")[-1]

    ?##continue if factors
    ?if(any(fact.frame == "factor")) {
    ? ?id.factors <- which(fact.frame == "factor")
    ? ?fact.name <- names(fact.frame)[id.factors] #identify the rows for factors

    Some packages create a dataClass attribute for a model.frame
    (not its terms attribute) that does not have any names:
    file: caper/R/macrocaic.R
    ? attr(mf, "dataClasses") <- rep("numeric", dim(termFactors)[2])
    .checkMFClasses() does not throw an error for that, but it
    doesn't do any real checking either.

    Most users of dataClasses do pass it to .checkMFClasses() to
    compare it with newdata and that doesn't care if you have extra
    entries in dataClasses.

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com
    I can't understand what your point is. I agree we can work around the
    problem, but why should we have to?
    I guess my point was that it would make sense for delete.response
    to drop the response element from dataClasses, as it has no use.
    It was almost certainly an oversight that it wasn't dropped, as most
    terms objects don't have the dataClasses attribute.

    Properly written code, which only subscripted dataClasses by name
    (not by number) would not be affected by the change but improperly
    written code (e.g., AICcmodavg's predictSE, which assumes the response
    is in position 1) would be adversely affected in the unlikely case that
    someone passed it the output of delete.response.

    I don't know how much you want to cater to "errors" by package writers.

    Bill Dunlap
    Spotfire, TIBCO Software
    wdunlap tibco.com


    If you confine yourself to the output of "delete.response" applied to
    a terms object from a regression, can you point to any package or
    usage that depends on leaving the response variable in the dataClasses
    attribute? I can't find one. In R base, these are all the references
    to delete.response:

    stats/R/models.R:delete.response <- function (termobj)
    stats/R/lm.R: Terms <- delete.response(tt)
    stats/R/lm.R: Terms <- delete.response(tt)
    stats/R/ppr.R: Terms <- delete.response(object$terms)
    stats/R/loess.R:
    as.matrix(model.frame(delete.response(terms(object)), newdata,
    stats/R/dummy.coef.R: Terms <- delete.response(Terms)

    I've looked it over carefully and predict.lm (in lm.R) would not be
    affected by the change I propose. I can't find any usage in loess.R of
    the dataClasses attribute.

    Furthermore, I can't see how a person would use the dataClasses
    attribute at all, after the other markers of the response are
    eliminated. How is a method to find which variable is the response,
    after response=0?

    I'm not disagreeing with you that I can workaround the peculiarity
    that the response is left in the dataClasses attribute of the output
    object from delete.response. I'm just saying it is a complication
    that programmers should not have to put up with, because I think
    delete.response should delete the response from all attributes of a
    terms object.

    pj


    --
    Paul E. Johnson
    Professor, Political Science
    1541 Lilac Lane, Room 504
    University of Kansas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-devel @
categoriesr
postedJan 5, '12 at 8:26p
activeJan 6, '12 at 8:23p
posts5
users2
websiter-project.org
irc#r

2 users in discussion

William Dunlap: 3 posts Paul Johnson: 2 posts

People

Translate

site design / logo © 2022 Grokbase