Grokbase Groups R r-help January 2012
FAQ

[R] formula error inside function

Moli
Jan 25, 2012 at 7:02 am
I want use survfit() and basehaz() inside a function, but it doesn't work.
Could you take a look at this problem. Thanks for your help. Following is my
codes:

library(survival)
n <- 50 # total sample size
nclust <- 5 # number of clusters
clusters <- rep(1:nclust,each=n/nclust)
beta0 <- c(1,2)
set.seed(13)
#generate phmm data set
Z <- cbind(Z1=sample(0:1,n,replace=TRUE),
Z2=sample(0:1,n,replace=TRUE),
Z3=sample(0:1,n,replace=TRUE))
b <-
cbind(rep(rnorm(nclust),each=n/nclust),rep(rnorm(nclust),each=n/nclust))
Wb <- matrix(0,n,2)
for( j in 1:2) Wb[,j] <- Z[,j]*b[,j]
Wb <- apply(Wb,1,sum)
T <- -log(runif(n,0,1))*exp(-Z[,c('Z1','Z2')]%*%beta0-Wb)
C <- runif(n,0,1)
time <- ifelse(T<C,T,C)
event <- ifelse(T<=C,1,0)
mean(event)
phmmd <- data.frame(Z)
phmmd$cluster <- clusters
phmmd$time <- time
phmmd$event <- event

fmla <- as.formula("Surv(time, event) ~ Z1 + Z2")

BaseFun <- function(x){
start.coxph <- coxph(x, phmmd)

print(start.coxph)

betahat <- start.coxph$coefficient
print(betahat)
print(333)
print(survfit(start.coxph))
m <- basehaz(start.coxph)
print(m)
}
BaseFun(fmla)



Error in formula.default(object, env = baseenv()) : invalid formula


But the following function works:
coxph(fmla, phmmd)

--
View this message in context: http://r.789695.n4.nabble.com/formula-error-inside-function-tp4326549p4326549.html
Sent from the R help mailing list archive at Nabble.com.
reply

Search Discussions

6 responses

  • Milan Bouchet-Valat at Jan 25, 2012 at 9:59 am

    Le mardi 24 janvier 2012 ? 23:02 -0800, moli a ?crit :
    I want use survfit() and basehaz() inside a function, but it doesn't work.
    Could you take a look at this problem. Thanks for your help. Following is my
    codes: <snip>
    Error in formula.default(object, env = baseenv()) : invalid formula


    But the following function works:
    coxph(fmla, phmmd)
    Please provide us with a shorter code to reproduce the problem. You
    can't expect people to read debug the whole function to identify the
    problem. You can use traceback() after the error occurs to see where it
    comes from.

    Also, reproducing the code here doesn't lead to the same error:
    Error in model.frame(formula = x, data = phmmd) : object 'x' not found


    Cheers
  • Terry Therneau at Jan 25, 2012 at 1:25 pm

    I want use survfit() and basehaz() inside a function, but it doesn't
    work. Could you take a look at this problem. Thanks for your help.
    Your problem has to do with environments, and these lines

    fmla <- as.formula("Surv(time, event) ~ Z1 + Z2")
    BaseFun <- function(x){
    start.coxph <- coxph(x, phmmd)
    ...
    survfit(start.coxph)
    }
    Basefun(fmla)

    The survfit routine needs to reconstruct the model matrix, and by
    default in R this is done in the context where the model formula was
    first defined. Unfortunately this is outside the function, leading to
    problems -- your argument "x" is is unknown in the outer envirnoment.
    The solution is to add "model=TRUE" to the coxph call so that the model
    frame is saved and survfit doesn't have to do reconstruction.

    If you think this should work as is, well, so do I. I spent a lot of
    time on this issue a few months ago and finally threw in the towel. The
    interaction of environments with model.frame and model.matrix is subtle
    and far from obvious. (Just to be clear: I didn't say "broken". Each
    aspect of the process has well thought out reasons.) The standard
    modeling functions lm, glm, etc changed their defaults from model=F to
    model=T at some point. This costs some space & memory, but coxph may
    need to do the same.

    Terry T
  • Moli at Jan 25, 2012 at 6:50 pm
    Thanks for you help. I also spent a lot of time on it. Your explanation is
    very clear.

    --
    View this message in context: http://r.789695.n4.nabble.com/formula-error-inside-function-tp4326549p4328117.html
    Sent from the R help mailing list archive at Nabble.com.
  • Hugh Morgan at Jan 29, 2012 at 7:46 pm
    Hi,

    I have what I suppose is the same problem as this. I am using the
    linear mixed model function lme, and this does not seems to take the
    attribute "model=TRUE" at the end of the function.

    Is there a more general way of solving this problem?
    Is my description of the problem below correct (from my understanding of
    cran.r-project.org/doc/contrib/Fox-Companion/appendix-scope.pdf)?

    Using the test script:

    calculate_mixed_model_p <- function() {
    dataObj=read.csv('dataMini.csv', header=TRUE, sep=",", dec=".")
    colnames(dataObj)
    attach(dataObj)
    library(nlme)

    formulaGenotype = test_variable~Genotype + Gender
    formulaNull = test_variable~Gender
    finalModelGenotype = lme(formulaGenotype, random=~1|Date, dataObj,
    na.action="na.omit", method="ML", keep.data = TRUE)
    finalModelNull = lme(formulaNull, random=~1|Date, dataObj,
    na.action="na.omit", method="ML")
    anovaModel = anova (finalModelGenotype,finalModelNull)
    print(anovaModel)
    }

    Fails with:

    Error in eval(expr, envir, enclos) : object 'formulaGenotype' not found

    I THINK function lme(...) constructs an object (finalModelGenotype) that
    has as part of it a link (pointer?) to object formulaGenotype. During
    construction this is in the function scope as it was passed to it. When
    finalModelGenotype is later passed to function anova(...) the link is
    still there but as the lme(...) scope no longer exists the link is now
    broken.

    Any help greatly apperitiated,

    Hugh

    PS, I tried to make this script self contained, and generated the data
    object with the following lines. It looks identical when you print it,
    but the lme function fails with error at [2]. If someone was to tell me
    what I am doing wrong I may be able to post easier scripts.

    [1]

    dataObj=data.frame(test_variable=c("23.0","20.2","23.8","25.6","24.6","22.7","27.7","27.5","23.5","22.8","22.3","20.9","26.6","23.8","24.5","26.8","23.2","29.9","23.3","22.5","22.2","27.2","28.1","24.5","22.7","20.7","26.2","27.1","22.0","22.2","26.7","28.5","22.2","22.1","25.3","21.7","29.3"),

    Gender=c("Female","Female","Male","Male","Male","Female","Male","Male","Female","Female","Female","Female","Male","Male","Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Female","Female","Male","Male","Female","Female","Male","Male","Female","Female","Female","Female","Male"),

    Genotype=c("10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0"),

    Assay.Date=c("01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","07/07/2010","07/07/2010","07/07/2010","01/07/2009","07/07/2010","07/07/2010","07/07/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","17/06/2010","17/06/2010","17/06/2010","17/06/2010","16/06/2010","16/06/2010","16/06/2010","16/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010"),

    Weight=c("9.9","9.5","9.9","10","9.9","9.8","10.2","10.4","9.9","9.8","9.9","9.5","9.8","9.5","9.8","9.9","9.5","10","9.8","9.5","9.7","10","10.2","9.9","9.9","9.5","10","10","9.8","9.9","10.2","10.1","9.8","9.9","10.2","9.8","10")
    )

    [2]

    Error in `rownames<-`(`*tmp*`, value = c("1", "2", "3", "4", "5", "6", :
    attempt to set rownames on object with no dimensions
    In addition:Warning message:
    In Ops.factor(y[revOrder], Fitted) : - not meaningful for factors


    On 01/25/2012 01:25 PM, Terry Therneau wrote:
    I want use survfit() and basehaz() inside a function, but it doesn't
    work. Could you take a look at this problem. Thanks for your help.
    Your problem has to do with environments, and these lines

    fmla<- as.formula("Surv(time, event) ~ Z1 + Z2")
    BaseFun<- function(x){
    start.coxph<- coxph(x, phmmd)
    ...
    survfit(start.coxph)
    }
    Basefun(fmla)

    The survfit routine needs to reconstruct the model matrix, and by
    default in R this is done in the context where the model formula was
    first defined. Unfortunately this is outside the function, leading to
    problems -- your argument "x" is is unknown in the outer envirnoment.
    The solution is to add "model=TRUE" to the coxph call so that the model
    frame is saved and survfit doesn't have to do reconstruction.

    If you think this should work as is, well, so do I. I spent a lot of
    time on this issue a few months ago and finally threw in the towel. The
    interaction of environments with model.frame and model.matrix is subtle
    and far from obvious. (Just to be clear: I didn't say "broken". Each
    aspect of the process has well thought out reasons.) The standard
    modeling functions lm, glm, etc changed their defaults from model=F to
    model=T at some point. This costs some space& memory, but coxph may
    need to do the same.

    Terry T

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Hugh Morgan at Feb 2, 2012 at 4:57 pm
    Hi,

    I have fixed this. I replaced the "=" with "<<-". I do not think this
    is the most elegant way, so if anyone else has any better ideas they
    would be very much apperitiaded.

    New lines:

    formulaGenotype <<- test_variable~Genotype + Gender
    formulaNull <<- test_variable~Gender

    Cheers,

    Hugh
    On 01/29/2012 07:46 PM, Hugh Morgan wrote:
    Hi,

    I have what I suppose is the same problem as this. I am using the
    linear mixed model function lme, and this does not seems to take the
    attribute "model=TRUE" at the end of the function.

    Is there a more general way of solving this problem?
    Is my description of the problem below correct (from my understanding
    of cran.r-project.org/doc/contrib/Fox-Companion/appendix-scope.pdf)?

    Using the test script:

    calculate_mixed_model_p <- function() {
    dataObj=read.csv('dataMini.csv', header=TRUE, sep=",", dec=".")
    colnames(dataObj)
    attach(dataObj)
    library(nlme)

    formulaGenotype = test_variable~Genotype + Gender
    formulaNull = test_variable~Gender
    finalModelGenotype = lme(formulaGenotype, random=~1|Date, dataObj,
    na.action="na.omit", method="ML", keep.data = TRUE)
    finalModelNull = lme(formulaNull, random=~1|Date, dataObj,
    na.action="na.omit", method="ML")
    anovaModel = anova (finalModelGenotype,finalModelNull)
    print(anovaModel)
    }

    Fails with:

    Error in eval(expr, envir, enclos) : object 'formulaGenotype' not found

    I THINK function lme(...) constructs an object (finalModelGenotype)
    that has as part of it a link (pointer?) to object formulaGenotype.
    During construction this is in the function scope as it was passed to
    it. When finalModelGenotype is later passed to function anova(...)
    the link is still there but as the lme(...) scope no longer exists the
    link is now broken.

    Any help greatly apperitiated,

    Hugh

    PS, I tried to make this script self contained, and generated the data
    object with the following lines. It looks identical when you print
    it, but the lme function fails with error at [2]. If someone was to
    tell me what I am doing wrong I may be able to post easier scripts.

    [1]

    dataObj=data.frame(test_variable=c("23.0","20.2","23.8","25.6","24.6","22.7","27.7","27.5","23.5","22.8","22.3","20.9","26.6","23.8","24.5","26.8","23.2","29.9","23.3","22.5","22.2","27.2","28.1","24.5","22.7","20.7","26.2","27.1","22.0","22.2","26.7","28.5","22.2","22.1","25.3","21.7","29.3"),

    Gender=c("Female","Female","Male","Male","Male","Female","Male","Male","Female","Female","Female","Female","Male","Male","Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Female","Female","Male","Male","Female","Female","Male","Male","Female","Female","Female","Female","Male"),

    Genotype=c("10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0"),

    Assay.Date=c("01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","07/07/2010","07/07/2010","07/07/2010","01/07/2009","07/07/2010","07/07/2010","07/07/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","17/06/2010","17/06/2010","17/06/2010","17/06/2010","16/06/2010","16/06/2010","16/06/2010","16/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010"),

    Weight=c("9.9","9.5","9.9","10","9.9","9.8","10.2","10.4","9.9","9.8","9.9","9.5","9.8","9.5","9.8","9.9","9.5","10","9.8","9.5","9.7","10","10.2","9.9","9.9","9.5","10","10","9.8","9.9","10.2","10.1","9.8","9.9","10.2","9.8","10")
    )

    [2]

    Error in `rownames<-`(`*tmp*`, value = c("1", "2", "3", "4", "5", "6", :
    attempt to set rownames on object with no dimensions
    In addition:Warning message:
    In Ops.factor(y[revOrder], Fitted) : - not meaningful for factors


    On 01/25/2012 01:25 PM, Terry Therneau wrote:
    I want use survfit() and basehaz() inside a function, but it doesn't
    work. Could you take a look at this problem. Thanks for your help.
    Your problem has to do with environments, and these lines

    fmla<- as.formula("Surv(time, event) ~ Z1 + Z2")
    BaseFun<- function(x){
    start.coxph<- coxph(x, phmmd)
    ...
    survfit(start.coxph)
    }
    Basefun(fmla)

    The survfit routine needs to reconstruct the model matrix, and by
    default in R this is done in the context where the model formula was
    first defined. Unfortunately this is outside the function, leading to
    problems -- your argument "x" is is unknown in the outer envirnoment.
    The solution is to add "model=TRUE" to the coxph call so that the model
    frame is saved and survfit doesn't have to do reconstruction.

    If you think this should work as is, well, so do I. I spent a lot of
    time on this issue a few months ago and finally threw in the towel. The
    interaction of environments with model.frame and model.matrix is subtle
    and far from obvious. (Just to be clear: I didn't say "broken". Each
    aspect of the process has well thought out reasons.) The standard
    modeling functions lm, glm, etc changed their defaults from model=F to
    model=T at some point. This costs some space& memory, but coxph may
    need to do the same.

    Terry T

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • David Winsemius at Jan 25, 2012 at 1:47 pm

    On Jan 25, 2012, at 2:02 AM, moli wrote:

    I want use survfit() and basehaz() inside a function, but it doesn't
    work.
    Could you take a look at this problem. Thanks for your help.
    Following is my
    codes:

    library(survival)
    n <- 50 # total sample size
    nclust <- 5 # number of clusters
    clusters <- rep(1:nclust,each=n/nclust)
    beta0 <- c(1,2)
    set.seed(13)
    #generate phmm data set
    Z <- cbind(Z1=sample(0:1,n,replace=TRUE),
    Z2=sample(0:1,n,replace=TRUE),
    Z3=sample(0:1,n,replace=TRUE))
    b <-
    cbind(rep(rnorm(nclust),each=n/nclust),rep(rnorm(nclust),each=n/
    nclust))
    Wb <- matrix(0,n,2)
    for( j in 1:2) Wb[,j] <- Z[,j]*b[,j]
    Wb <- apply(Wb,1,sum)
    T <- -log(runif(n,0,1))*exp(-Z[,c('Z1','Z2')]%*%beta0-Wb)
    C <- runif(n,0,1)
    time <- ifelse(T<C,T,C)
    event <- ifelse(T<=C,1,0)
    mean(event)
    phmmd <- data.frame(Z)
    phmmd$cluster <- clusters
    phmmd$time <- time
    phmmd$event <- event

    fmla <- as.formula("Surv(time, event) ~ Z1 + Z2")

    BaseFun <- function(x){
    You should adopt a practice of using more meaningful parameter names.
    In this case "x" happens to be one of the parameter names for coxph.
    So I suguggest something like:
    BaseFun <- function(frm){
    start.coxph <- coxph(x, phmmd)
    Then try changing that line to

    start.coxph <- coxph(frm, phmmd, x=TRUE,y=TRUE)

    It turns out that unless I included x=TRUE and y=TRUE, (as suggested
    by an error essage I encountered during debugging) that the right
    information was not available inside your function. Personally I would
    have passed the data to the functions as well but that doesn't seem to
    have been necessary (or sufficient to solve the problem).

    --
    David.
    print(
    )

    betahat <- start.coxph$coefficient
    print(betahat)
    print(333)
    print(survfit(start.coxph))
    m <- basehaz(start.coxph)
    print(m)
    }
    BaseFun(fmla)



    Error in formula.default(object, env = baseenv()) : invalid formula


    But the following function works:
    coxph(fmla, phmmd)

    --
    View this message in context: http://r.789695.n4.nabble.com/formula-error-inside-function-tp4326549p4326549.html
    Sent from the R help mailing list archive at Nabble.com.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
    David Winsemius, MD
    West Hartford, CT

Related Discussions

Discussion Navigation
viewthread | post