Dear list,

I am using the lrm function from the rms package to estimate a logistic

model with weights. The c-statistic (or area under the curve) is part of

the lrm output.

To understand how the weights enter the computation of the c-statistics, I

looked at the script of lrm and lrm.fit but I am out of luck because it is

making a call to a Fortran routine and I don't know Fortran.

z <- .Fortran("lrmfit", coef = initial, nx, 1:nx, x,

y, offset, u = double(nvi), double(nvi * (nvi + 1)),

double(1), n, nx, sumw, nvi, v = double(nvi * nvi),

double(nvi), double(2 * nvi), double(nvi), integer(nvi),

opts = opts, ftable, penmat, weights, PACKAGE = "rms")

Can somebody help me figure out how the weights from the regression are

used in the computation of the c-statistic? Here is a small example that

shows that the c-statistic computed from the rms package and using the pROC

packages are not the same (not even close) when calculated from a weighted

logistic regression.

set.seed(1233)

x <- rnorm(100)

w <- runif(100)

y <- rbinom(100, 1, .5)

require(rms)

# unweighted model

umod <- lrm(y~x)

umod$stat # c-statistic is 0.5776796

# weighted model

wmod <- lrm(y~x, weight = w)

wmod$stat # c-statistic is 0.65625

# using pROC

require(pROC)

umod2 <- glm(y~x, family = binomial)

auc(y, predict(umod2)) # 0.5769

wmod2 <- glm(y~x, weights = w, family = binomial)

auc(y, predict(wmod2)) # 0.5769

BTW results from umod and umod2 and from wmod and wmod2 are identical so

the discrepancy in c-statistics in not due to using lrm vs. glm.

Best regards,

MP

[[alternative HTML version deleted]]