FAQ
The model frame shows the response and predictors in a data frame with
nicely labelled columns:

fm <- lm(wt~qsec+log(hp)+sqrt(disp), data=mtcars)
model.frame(fm) # ok

When the left hand side consists of more than one response, those response
variables still look good, inside a matrix:

fm <- lm(cbind(qsec,hp,disp)~wt, data=mtcars)
model.frame(fm)[[1]] # ok

A problem arises when some of the response variables are transformed:

fm <- lm(cbind(qsec,log(hp),sqrt(disp))~wt, data=mtcars)
model.frame(fm)[[1]] # ugh, missing column names

The model frame is useful for many things, even more so when all column
names are legible. Therefore I propose adding two new lines to
model.frame.default() between lines 371 and 372 in
R-patched_2010-01-12/src/library/stats/R/models.R:

varnames <- sapply(vars, deparse, width.cutoff = 500)[-1L]
variables <- eval(predvars, data, env)

NEW if (is.matrix(variables[[1L]]))
NEW colnames(variables[[1L]]) <- as.character(formula[[2L]])[-1L]

if (is.null(rownames) && (resp <- attr(formula, "response")) >
0L) {

With this fix, the above example returns legible column names:

fm <- lm(cbind(qsec,log(hp),sqrt(disp))~wt, data=mtcars)
model.frame(fm)[[1]] # nice column names

I hope the R development team can either commit this fix or improve it.

Thanks,

Arni

Search Discussions

  • Prof Brian Ripley at Jan 21, 2010 at 7:01 pm
    A few points.

    0) This seems a Wishlist item, but it does not say so (see the section
    on BUGS in the FAQ).

    1) A formula does not need to have an lhs, and it is an assumption
    that the response is the first element of 'variables' (an assumption
    not made a couple of lines later when 'resp' is used).

    2) I don't think this is the best way to get names. If I do

    fm <- lm(cbind(a=qsec,b=log(hp),sqrt(disp))~wt, data=mtcars)

    I want a and b as names, but that is not what your code gives. And if
    I do
    X <- with(mtcars, cbind(a = qsec, b = log(hp), c=sqrt(disp)))
    fm <- lm(X ~ wt, data=mtcars)
    model.frame(fm)[[1]]
    [,1] [,2] [,3]

    You've lost the names that the current code gives.

    The logic is that if you use a lhs which is a matrix with column
    names, then those names are used. If (as you did), you use one with
    empty column names, that is what you get in the model frame. This
    seems much more in the spirit of R than second-guessing that the
    author actually meant to give column names and create them, let alone
    renaming the columns to be different than the names supplied.

    3) It looks to me as if you wanted

    cbind(qsec, log(hp), sqrt(disp), deparse.level=2)

    but that does not give names (despite the description). And that is I
    think a bug that can easily be changed. That way we can fulfil yoour
    wish without breaking other people's code.

    On Tue, 19 Jan 2010, arnima at hafro.is wrote:

    The model frame shows the response and predictors in a data frame with
    nicely labelled columns:

    fm <- lm(wt~qsec+log(hp)+sqrt(disp), data=mtcars)
    model.frame(fm) # ok

    When the left hand side consists of more than one response, those response
    variables still look good, inside a matrix:

    fm <- lm(cbind(qsec,hp,disp)~wt, data=mtcars)
    model.frame(fm)[[1]] # ok

    A problem arises when some of the response variables are transformed:

    fm <- lm(cbind(qsec,log(hp),sqrt(disp))~wt, data=mtcars)
    model.frame(fm)[[1]] # ugh, missing column names

    The model frame is useful for many things, even more so when all column
    names are legible. Therefore I propose adding two new lines to
    model.frame.default() between lines 371 and 372 in
    R-patched_2010-01-12/src/library/stats/R/models.R:

    varnames <- sapply(vars, deparse, width.cutoff = 500)[-1L]
    variables <- eval(predvars, data, env)

    NEW if (is.matrix(variables[[1L]]))
    NEW colnames(variables[[1L]]) <- as.character(formula[[2L]])[-1L]

    if (is.null(rownames) && (resp <- attr(formula, "response")) >
    0L) {

    With this fix, the above example returns legible column names:

    fm <- lm(cbind(qsec,log(hp),sqrt(disp))~wt, data=mtcars)
    model.frame(fm)[[1]] # nice column names

    I hope the R development team can either commit this fix or improve it.

    Thanks,

    Arni

    ______________________________________________
    R-devel at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
    --
    Brian D. Ripley, ripley at stats.ox.ac.uk
    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
    University of Oxford, Tel: +44 1865 272861 (self)
    1 South Parks Road, +44 1865 272866 (PA)
    Oxford OX1 3TG, UK Fax: +44 1865 272595
  • Arni Magnusson at Jan 21, 2010 at 8:30 pm
    Thank you Prof. Ripley, for examining this issue. I have two more
    questions on this topic, if I may.

    (1) Truncated column names

    With your explanations I can see that the problem of missing column names
    originates in cbind() and the 'deparse.level' bug we have just discovered.
    I had tried different 'deparse.level' values, only to see that it didn't
    solve my problem of missing column names.

    attach(mtcars)
    cbind(qsec,log(hp),sqrt(disp), deparse.level=0) # no column names
    cbind(qsec,log(hp),sqrt(disp), deparse.level=1) # qsec only
    cbind(qsec,log(hp),sqrt(disp), deparse.level=2) # no column names
    cbind(qsec=qsec,log(hp),sqrt(disp), deparse.level=2) # works!
    cbind(qsec=qsec,log(hp),sqrt(abs(disp)), deparse.level=2) # hmm...

    Now a new question arises. The last line generates these truncated column
    names

    "qsec" "log(hp)" "sqrt(abs(d..."

    where the dots are not mine, but something that R decided to do,
    presumably to keep the column names no longer than 13 characters. I would
    prefer to retain the full column names, like this,

    as.matrix(data.frame(qsec,log(hp),sqrt(abs(disp)), check.namesúLSE))

    where the column names are

    "qsec" "log(hp)" "sqrt(abs(disp))"

    Is there some reason why cbind() should truncate column names? Matrices
    have no problems with very long column names.


    (2) Changing the default 'deparse.level' to 2

    Furthermore, since many users appreciate the compact model formula syntax
    in R, it would be great if the formula

    cbind(qsec, log(hp), sqrt(disp)) ~ wt

    would result in a model frame with full column names, without sacrificing
    legibility by adding deparse.level=2 in between the variable names. The
    simplest way to achieve this would be by changing the default value of
    'deparse.level' to 2 in cbind() and probably rbind().

    Am I missing some important cases where functions/users rely on some of
    the column names to be missing, as generated by deparse.level=1? And if
    so, do these cases outweigh the benefits of clean and compact formula
    syntax when modelling?


    Many thanks,

    Arni


    On Thu, 21 Jan 2010, Prof Brian Ripley wrote:

    A few points.

    0) This seems a Wishlist item, but it does not say so (see the section
    on BUGS in the FAQ).

    1) A formula does not need to have an lhs, and it is an assumption that
    the response is the first element of 'variables' (an assumption not made
    a couple of lines later when 'resp' is used).

    2) I don't think this is the best way to get names. If I do

    fm <- lm(cbind(a=qsec,b=log(hp),sqrt(disp))~wt, data=mtcars)

    I want a and b as names, but that is not what your code gives. And if I
    do
    X <- with(mtcars, cbind(a = qsec, b = log(hp), c=sqrt(disp)))
    fm <- lm(X ~ wt, data=mtcars)
    model.frame(fm)[[1]]
    [,1] [,2] [,3]

    You've lost the names that the current code gives.

    The logic is that if you use a lhs which is a matrix with column names,
    then those names are used. If (as you did), you use one with empty
    column names, that is what you get in the model frame. This seems much
    more in the spirit of R than second-guessing that the author actually
    meant to give column names and create them, let alone renaming the
    columns to be different than the names supplied.

    3) It looks to me as if you wanted

    cbind(qsec, log(hp), sqrt(disp), deparse.level=2)

    but that does not give names (despite the description). And that is I
    think a bug that can easily be changed. That way we can fulfil yoour
    wish without breaking other people's code.

    --
    Brian D. Ripley, ripley at stats.ox.ac.uk
    Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
    University of Oxford, Tel: +44 1865 272861 (self)
    1 South Parks Road, +44 1865 272866 (PA)
    Oxford OX1 3TG, UK Fax: +44 1865 272595

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-devel @
categoriesr
postedJan 19, '10 at 1:20a
activeJan 21, '10 at 8:30p
posts3
users2
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase