Grokbase Groups R r-help June 2009
FAQ

[R] Apply as.factor (or as.numeric etc) to multiple columns

Mark Na
Jun 23, 2009 at 9:23 pm
Hi R-helpers,

I have a dataframe with 60columns and I would like to convert several
columns to factor, others to numeric, and yet others to dates. Rather
than having 60 lines like this:

data$Var1<-as.factor(data$Var1)

I wonder if it's possible to write one line of code (per data type,
e.g. factor) that would apply a function (e.g., as.factor) to several
(non-contiguous) columns. So, I could then use 3 or 4 lines of code
(for 3 or 4 data types) instead of 60.

I have tried writing an apply function, but it failed.

Thanks for any help you might be able to provide.

Mark Na
reply

Search Discussions

6 responses

  • Hadley wickham at Jun 23, 2009 at 9:36 pm
    Hi Mark,

    Have a look at colwise (and numcolwise and catcolwise) in the plyr package.

    Hadley

    On Tue, Jun 23, 2009 at 4:23 PM, Mark Nawrote:
    Hi R-helpers,

    I have a dataframe with 60columns and I would like to convert several
    columns to factor, others to numeric, and yet others to dates. Rather
    than having 60 lines like this:

    data$Var1<-as.factor(data$Var1)

    I wonder if it's possible to write one line of code (per data type,
    e.g. factor) that would apply a function (e.g., as.factor) to several
    (non-contiguous) columns. So, I could then use 3 or 4 lines of code
    (for 3 or 4 data types) instead of 60.

    I have tried writing an apply function, but it failed.

    Thanks for any help you might be able to provide.

    Mark Na

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


    --
    http://had.co.nz/
  • Gabor Grothendieck at Jun 23, 2009 at 9:45 pm
    Try this:

    ix <- 2:5
    DF[ix] <- lapply(DF[ix], as.numeric)

    nms <- c("x", "y")
    DF[nms] <- lapply(DF[nms], as.factor)


    On Tue, Jun 23, 2009 at 5:23 PM, Mark Nawrote:
    Hi R-helpers,

    I have a dataframe with 60columns and I would like to convert several
    columns to factor, others to numeric, and yet others to dates. Rather
    than having 60 lines like this:

    data$Var1<-as.factor(data$Var1)

    I wonder if it's possible to write one line of code (per data type,
    e.g. factor) that would apply a function (e.g., as.factor) to several
    (non-contiguous) columns. So, I could then use 3 or 4 lines of code
    (for 3 or 4 data types) instead of 60.

    I have tried writing an apply function, but it failed.

    Thanks for any help you might be able to provide.

    Mark Na

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Baptiste auguie at Jun 23, 2009 at 9:45 pm
    Wacek helped me out on a similar topic a while back,

    ize =
    function (d, columns = names(d), izer = as.factor)
    {
    d[columns] = lapply(d[columns], izer)
    d
    }
    d = data.frame(x=1:10, y=1:10, z =1:10)

    str( ize(d, 'y') ) # y is now a factor
    str( ize(d, 1:2, `cumsum`) ) # x and y are affected

    etc.

    HTH,

    baptiste



    Mark Na wrote:
    Hi R-helpers,

    I have a dataframe with 60columns and I would like to convert several
    columns to factor, others to numeric, and yet others to dates. Rather
    than having 60 lines like this:

    data$Var1<-as.factor(data$Var1)

    I wonder if it's possible to write one line of code (per data type,
    e.g. factor) that would apply a function (e.g., as.factor) to several
    (non-contiguous) columns. So, I could then use 3 or 4 lines of code
    (for 3 or 4 data types) instead of 60.

    I have tried writing an apply function, but it failed.

    Thanks for any help you might be able to provide.

    Mark Na

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    --
    _____________________________

    Baptiste Augui?

    School of Physics
    University of Exeter
    Stocker Road,
    Exeter, Devon,
    EX4 4QL, UK

    Phone: +44 1392 264187

    http://newton.ex.ac.uk/research/emag
  • Bengoechea Bartolomé Enrique (SIES 73) at Jun 25, 2009 at 7:43 am
    Hi Mark,

    I frequently need to do that when importing data. This one-liner works:
    data.frame(mapply(as, x, c("integer", "character", "factor"), SIMPLIFYúLSE), stringsAsFactorsúLSE);
    but it has two problems:

    1) as() is an S4 method that does not always work
    2) writting the vector of classes for 60 variables is rather tedious.

    Both issues can be solved with the following two helper functions. The first function tries to use as(x, class); if it doesn't work, tries as.<class>(x); If it still doesn't work, tries <class>(x). The second function tranforms a single string to a character vector of classes, by transforming each letter in the string to a class name (i.e. "D" is tranformed to "Date", "i" to "integer", etc.), so that writting 60 classes is fast.

    doCoerce <- function(x, class) {
    if (canCoerce(x, class))
    as(x, class)
    else {
    result <- try(match.fun(paste("as", class, sep="."))(x), silent=TRUE);
    if (inherits(result, "try-error"))
    result <- match.fun(class)(x)
    result;
    }
    }

    expandClasses <- function (x) {
    unknowns <- character(0)
    result <- lapply(strsplit(as.character(x), NULL, fixed = TRUE),
    function(y) {
    sapply(y, function(z) switch(z,
    i = "integer", n = "numeric",
    l = "logical", c = "character", x = "complex",
    r = "raw", f = "factor", D = "Date", P = "POSIXct",
    t = "POSIXlt", N = NA_character_, {
    unknowns <<- c(unknowns, z)
    NA_character_
    }), USE.NAMES = FALSE)
    })
    if (length(unknowns)) {
    unknowns <- unique(unknowns)
    warning(sprintf(ngettext(length(unknowns), "code %s not recognized",
    "codes %s not recognized"), dqMsg(unknowns)))
    }
    result
    }

    An example:
    x <- data.frame(X="2008-01-01", Y=1.1:3.1, Z=letters[1:3])
    data.frame(mapply(doCoerce, x, expandClasses("Dif")[[1L]], SIMPLIFYúLSE), stringsAsFactorsúLSE);
    Regards,

    Enrique


    ------------------------------

    Message: 99
    Date: Tue, 23 Jun 2009 15:23:54 -0600
    From: Mark Na <mtb...@...com>
    Subject: [R] Apply as.factor (or as.numeric etc) to multiple columns
    To: r-help at r-project.org
    Message-ID:
    <e40...@...com>
    Content-Type: text/plain; charset=ISO-8859-1

    Hi R-helpers,

    I have a dataframe with 60columns and I would like to convert several
    columns to factor, others to numeric, and yet others to dates. Rather
    than having 60 lines like this:

    data$Var1<-as.factor(data$Var1)

    I wonder if it's possible to write one line of code (per data type,
    e.g. factor) that would apply a function (e.g., as.factor) to several
    (non-contiguous) columns. So, I could then use 3 or 4 lines of code
    (for 3 or 4 data types) instead of 60.

    I have tried writing an apply function, but it failed.

    Thanks for any help you might be able to provide.

    Mark Na
  • Gabor Grothendieck at Jun 25, 2009 at 11:41 am
    That's quite nice. Three comments:

    - colClasses() in R.utils is similar, except for the particular
    codes and classes supported, to expandClasses() here.

    - not sure if this is important but if as() were the last
    possibility tried rather than the first then in most
    cases (in fact all cases handled by expandClasses() )
    there would be no use of the methods package.

    - paste("as", ...) handles all the common cases including
    all cases handled by expandClasses() except NA_character_
    and could be used as a poor man's doCoerce().

    On Thu, Jun 25, 2009 at 3:43 AM, Bengoechea Bartolom? Enrique (SIES
    73)wrote:
    Hi Mark,

    I frequently need to do that when importing data. This one-liner works:
    data.frame(mapply(as, x, c("integer", "character", "factor"), SIMPLIFYúLSE), stringsAsFactorsúLSE);
    but it has two problems:

    1) as() is an S4 method that does not always work
    2) writting the vector of classes for 60 variables is rather tedious.

    Both issues can be solved with the following two helper functions. The first function tries to use as(x, class); if it doesn't work, tries as.<class>(x); If it still doesn't work, tries <class>(x). The second function tranforms a single string to a character vector of classes, by transforming each letter in the string to a class name (i.e. "D" is tranformed to "Date", "i" to "integer", etc.), so that writting 60 classes is fast.

    doCoerce <- function(x, class) {
    ? ? ? ?if (canCoerce(x, class))
    ? ? ? ? ? ? ? ?as(x, class)
    ? ? ? ?else {
    ? ? ? ? ? ? ? ?result <- try(match.fun(paste("as", class, sep="."))(x), silent=TRUE);
    ? ? ? ? ? ? ? ?if (inherits(result, "try-error"))
    ? ? ? ? ? ? ? ? ? ? ? ?result <- match.fun(class)(x)
    ? ? ? ? ? ? ? ?result;
    ? ?}
    }

    expandClasses <- function (x) {
    ? ?unknowns <- character(0)
    ? ?result <- lapply(strsplit(as.character(x), NULL, fixed = TRUE),
    ? ? ? ?function(y) {
    ? ? ? ? ? ?sapply(y, function(z) switch(z,
    ? ? ? ? ? ? ? ? ? ? ? ?i = "integer", n = "numeric",
    ? ? ? ? ? ? ? ?l = "logical", c = "character", x = "complex",
    ? ? ? ? ? ? ? ?r = "raw", f = "factor", D = "Date", P = "POSIXct",
    ? ? ? ? ? ? ? ?t = "POSIXlt", N = NA_character_, {
    ? ? ? ? ? ? ? ? ?unknowns <<- c(unknowns, z)
    ? ? ? ? ? ? ? ? ?NA_character_
    ? ? ? ? ? ? ? ?}), USE.NAMES = FALSE)
    ? ? ? ?})
    ? ?if (length(unknowns)) {
    ? ? ? ?unknowns <- unique(unknowns)
    ? ? ? ?warning(sprintf(ngettext(length(unknowns), "code %s not recognized",
    ? ? ? ? ? ?"codes %s not recognized"), dqMsg(unknowns)))
    ? ?}
    ? ?result
    }

    An example:
    x <- data.frame(X="2008-01-01", Y=1.1:3.1, Z=letters[1:3])
    data.frame(mapply(doCoerce, x, expandClasses("Dif")[[1L]], SIMPLIFYúLSE), stringsAsFactorsúLSE);
    Regards,

    Enrique


    ------------------------------

    Message: 99
    Date: Tue, 23 Jun 2009 15:23:54 -0600
    From: Mark Na <mtb...@...com>
    Subject: [R] Apply as.factor (or as.numeric etc) to multiple columns
    To: r-help at r-project.org
    Message-ID:
    ? ? ? ?<e40...@...com>
    Content-Type: text/plain; charset=ISO-8859-1

    Hi R-helpers,

    I have a dataframe with 60columns and I would like to convert several
    columns to factor, others to numeric, and yet others to dates. Rather
    than having 60 lines like this:

    data$Var1<-as.factor(data$Var1)

    I wonder if it's possible to write one line of code (per data type,
    e.g. factor) that would apply a function (e.g., as.factor) to several
    (non-contiguous) columns. So, I could then use 3 or 4 lines of code
    (for 3 or 4 data types) instead of 60.

    I have tried writing an apply function, but it failed.

    Thanks for any help you might be able to provide.

    Mark Na

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Bengoechea Bartolomé Enrique (SIES 73) at Jun 25, 2009 at 11:57 am
    Very good points :-)
    - colClasses() in R.utils is similar, except for the particular codes and classes supported, to expandClasses() here.
    In fact I saw colClasses() once and got the idea from it, but when I needed the functionallity I did not remember where had I seen it and rewrote it. Now with your hint I can reuse colClasses() instead. I also use a similar function to facilitate writting long logical vectors: expandLogical("TFTFNFFFTN")

    So, with your suggestions:
    x <- data.frame(X="2008-01-01", Y=1.1:3.1, Z=letters[1:3])
    data.frame(mapply(function(x, class) match.fun(paste("as", class, sep="."))(x), x, colClasses("Dif"), SIMPLIFYúLSE), stringsAsFactorsúLSE)
    Enrique

    -----Original Message-----
    From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
    Sent: jueves, 25 de junio de 2009 13:41
    To: Bengoechea Bartolom? Enrique (SIES 73)
    Cc: r-help at r-project.org; mtb954 at gmail.com
    Subject: Re: [R] Apply as.factor (or as.numeric etc) to multiple columns

    That's quite nice. Three comments:

    - colClasses() in R.utils is similar, except for the particular codes and classes supported, to expandClasses() here.

    - not sure if this is important but if as() were the last possibility tried rather than the first then in most cases (in fact all cases handled by expandClasses() ) there would be no use of the methods package.

    - paste("as", ...) handles all the common cases including all cases handled by expandClasses() except NA_character_ and could be used as a poor man's doCoerce().

Related Discussions