FAQ

[R] replacing missing values with row average

Daniel M.
Feb 27, 2011 at 11:25 pm
Hello,

I have some dataset, which i read it from external file using the (data <-
read.csv("my file location")) and read as a dataframe
is(data)
[1] "data.frame" "list" "oldClass" "vector"
but i have also converted this into a matrix and tried to apply my code but
didnt work.

Anyways, suppose i have the following data.


data <- as.data.frame(matrix(rnorm(100), nrow = 10))

And let's put some missing values

data[sample(1:10, 3), sample(1:10, 3)] <- NA

I want to replace all NA's by row averages or column averages of my matrix.

I tried to use(with my original data matrix)

data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
But got an error message of

Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
Then I converted data<- as.matrix(data)
data<- as.numeric(data)
And applying my code

data[is.na(data)] <- rowMeans(data, na.rm = TRUE)

Error message


Error in rowMeans(data, na.rm = TRUE) :
'x' must be an array of at least two dimensions

Then again i tried to convert it into Arrays....but the errors continues....

I Also tried the code

data[is.na(data)] <- apply(data,1,mean)

But still didnt work out.

Can anyone pls help me as to how to fix it and get out of this, please?

Thank you very much

Daniel
reply

Search Discussions

2 responses

  • Joshua Wiley at Feb 28, 2011 at 12:14 am
    Hi Daniel,

    If your data is stored in a matrix, the following should work (and be
    fairly efficient):

    #############
    dat <- matrix(rnorm(100), nrow = 10)
    dat[sample(1:10, 3), sample(1:10, 3)] <- NA
    ## create an index of missing values
    index <- which(is.na(dat), arr.ind = TRUE)
    ## calculate the row means and "duplicate" them to assign to appropriate cells
    dat[index] <- rowMeans(dat, na.rm = TRUE)[index[, "row"]]

    ## for documentation see
    ?which # particularly the arr.ind argument
    ?"[" # for extraction or selecting a subset to overwrite
    #############

    the only reason this does not work as is with data frames is because
    of how they are indexed/subset. dat[index] does not work. The
    required modification is probably fairly minimal, but if you are happy
    to use a matrix, then its a moot issue.

    HTH,

    Josh
    On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. wrote:
    Hello,

    I have some dataset, which i read it from external file using the (data <-
    read.csv("my file location")) and read as a dataframe
    is(data)
    [1] "data.frame" "list" ? ? ? "oldClass" ? "vector"
    but i have also converted this into a matrix and tried to apply my code but
    didnt work.

    Anyways, suppose i have the following data.


    ? ?data <- as.data.frame(matrix(rnorm(100), nrow = 10))

    And let's put some missing values

    ? ?data[sample(1:10, 3), sample(1:10, 3)] <- NA

    I want to replace all NA's by row averages or column averages of my matrix.

    I tried to use(with my original data matrix)

    ? ?data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
    But got an error message of

    ? ? ? Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
    Then I converted ?data<- as.matrix(data)
    ? ? ? ? ? ? ? ? ?data<- as.numeric(data)
    And applying my code

    ? ? data[is.na(data)] <- rowMeans(data, na.rm = TRUE)

    Error message


    ? ? ?Error in rowMeans(data, na.rm = TRUE) :
    ?'x' must be an array of at least two dimensions

    Then again i tried to convert it into Arrays....but the errors continues....

    I Also tried the code

    ? ?data[is.na(data)] <- apply(data,1,mean)

    But still didnt work out.

    Can anyone pls help me as to how to fix it and get out of this, please?

    Thank you very much

    Daniel



    ? ? ? ?[[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


    --
    Joshua Wiley
    Ph.D. Student, Health Psychology
    University of California, Los Angeles
    http://www.joshuawiley.com/
  • Bert Gunter at Feb 28, 2011 at 1:02 am
    Warning: This is not a helpful answer. Actually, it's a question: Why
    do you want to do this? Replacing missing values with row or column
    averages and then analyzing the data as if the missing values were not
    there is a dangerous thing to do it can produce biased estimates and
    understate the true error, likely resulting in biased inference. Of
    course, this depends on the specifics (how many are missing and
    where).

    R has a lot of built-in capabilities for handling missing values. I
    agree: it's not easy stuff. Nor do you necessarily need to get that
    complicated: Maybe your scheme is perfectly adequate for your
    situation. I just wanted to caution you think about this carefully if
    you aren't aware of the possible problems and haven't already done so.

    -- Bert


    On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. wrote:
    Hello,

    I have some dataset, which i read it from external file using the (data <-
    read.csv("my file location")) and read as a dataframe
    is(data)
    [1] "data.frame" "list" ? ? ? "oldClass" ? "vector"
    but i have also converted this into a matrix and tried to apply my code but
    didnt work.

    Anyways, suppose i have the following data.


    ? ?data <- as.data.frame(matrix(rnorm(100), nrow = 10))

    And let's put some missing values

    ? ?data[sample(1:10, 3), sample(1:10, 3)] <- NA

    I want to replace all NA's by row averages or column averages of my matrix.

    I tried to use(with my original data matrix)

    ? ?data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
    But got an error message of

    ? ? ? Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
    Then I converted ?data<- as.matrix(data)
    ? ? ? ? ? ? ? ? ?data<- as.numeric(data)
    And applying my code

    ? ? data[is.na(data)] <- rowMeans(data, na.rm = TRUE)

    Error message


    ? ? ?Error in rowMeans(data, na.rm = TRUE) :
    ?'x' must be an array of at least two dimensions

    Then again i tried to convert it into Arrays....but the errors continues....

    I Also tried the code

    ? ?data[is.na(data)] <- apply(data,1,mean)

    But still didnt work out.

    Can anyone pls help me as to how to fix it and get out of this, please?

    Thank you very much

    Daniel



    ? ? ? ?[[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


    --
    Bert Gunter
    Genentech Nonclinical Biostatistics

Related Discussions

Discussion Navigation
viewthread | post