Grokbase Groups R r-help June 2016
FAQ
Hello,


I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").


Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.


The reproducible example and the output are appended below.


Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?


Any hints will be appreciated.


Thanks,


Pradip Muhuri


# Reproducible Example


library("readr")
testdata <- read_csv(
"indicator, prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked, 84.5 (1.14)
3. Recieved flu vaccine, 50.0 (1.33)
4. Blood pressure checked, 88.7 (0.88)
5. Aspirin use-problems, 11.7 (1.02)
6.Colonoscopy, 60.2 (1.41)
7. Sigmoidoscopy, 6.1 (0.61)
8. Blood stool test, 14.6 (1.00)
9.Mammogram, 72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")


# Sort on the character variable in descending order
arrange(testdata, desc(prevalence))


# Results from Console


                       indicator prevalence
                           (chr) (chr)
1 4. Blood pressure checked 88.7 (0.88)
2 2. Blood cholesterol checked 84.5 (1.14)
3 1. Health check-up 77.2 (1.19)
4 10. Pap Smear test 73.3 (2.37)
5 9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7 7. Sigmoidoscopy 6.1 (0.61)
8 3. Recieved flu vaccine 50.0 (1.33)
9 8. Blood stool test 14.6 (1.00)
10 5. Aspirin use-problems 11.7 (1.02)




Pradip K. Muhuri, AHRQ/CFACT
  5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564

Search Discussions

  • Daniel Nordlund at Jun 15, 2016 at 10:37 pm

    On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
    Hello,

    I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

    Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

    The reproducible example and the output are appended below.

    Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

    Any hints will be appreciated.

    Thanks,

    Pradip Muhuri

    # Reproducible Example

    library("readr")
    testdata <- read_csv(
    "indicator, prevalence
    1. Health check-up, 77.2 (1.19)
    2. Blood cholesterol checked, 84.5 (1.14)
    3. Recieved flu vaccine, 50.0 (1.33)
    4. Blood pressure checked, 88.7 (0.88)
    5. Aspirin use-problems, 11.7 (1.02)
    6.Colonoscopy, 60.2 (1.41)
    7. Sigmoidoscopy, 6.1 (0.61)
    8. Blood stool test, 14.6 (1.00)
    9.Mammogram, 72.6 (1.82)
    10. Pap Smear test, 73.3 (2.37)")

    # Sort on the character variable in descending order
    arrange(testdata, desc(prevalence))

    # Results from Console

    indicator prevalence
    (chr) (chr)
    1 4. Blood pressure checked 88.7 (0.88)
    2 2. Blood cholesterol checked 84.5 (1.14)
    3 1. Health check-up 77.2 (1.19)
    4 10. Pap Smear test 73.3 (2.37)
    5 9.Mammogram 72.6 (1.82)
    6 6.Colonoscopy 60.2 (1.41)
    7 7. Sigmoidoscopy 6.1 (0.61)
    8 3. Recieved flu vaccine 50.0 (1.33)
    9 8. Blood stool test 14.6 (1.00)
    10 5. Aspirin use-problems 11.7 (1.02)


    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564


    The problem is that you are sorting a character variable.

    testdata$prevalence
       [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
       [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
    >


    Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a
    "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending
    order). If you want the character value of line 7 to sort last, it
    would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).


    Hope this is helpful,


    Dan


    Daniel Nordlund
    Port Townsend, WA USA
  • Jim Lemon at Jun 15, 2016 at 11:14 pm
    Hi Pradip,
    I'll assume that you are reading the data from a file:


    pm.df<-read.csv("pmdat.txt",stringsAsFactorsúLSE)
    # create a vector of numeric values of prevalence
    numprev<-as.numeric(sapply(strsplit(trimws(pm.df$prevalence)," "),"[",1))
    # order the data frame by that vector
    pm.df[order(numprev),]


    Jim




    On Thu, Jun 16, 2016 at 7:08 AM, Muhuri, Pradip (AHRQ/CFACT)
    wrote:
    Hello,

    I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

    Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

    The reproducible example and the output are appended below.

    Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

    Any hints will be appreciated.

    Thanks,

    Pradip Muhuri

    # Reproducible Example

    library("readr")
    testdata <- read_csv(
    "indicator, prevalence
    1. Health check-up, 77.2 (1.19)
    2. Blood cholesterol checked, 84.5 (1.14)
    3. Recieved flu vaccine, 50.0 (1.33)
    4. Blood pressure checked, 88.7 (0.88)
    5. Aspirin use-problems, 11.7 (1.02)
    6.Colonoscopy, 60.2 (1.41)
    7. Sigmoidoscopy, 6.1 (0.61)
    8. Blood stool test, 14.6 (1.00)
    9.Mammogram, 72.6 (1.82)
    10. Pap Smear test, 73.3 (2.37)")

    # Sort on the character variable in descending order
    arrange(testdata, desc(prevalence))

    # Results from Console

    indicator prevalence
    (chr) (chr)
    1 4. Blood pressure checked 88.7 (0.88)
    2 2. Blood cholesterol checked 84.5 (1.14)
    3 1. Health check-up 77.2 (1.19)
    4 10. Pap Smear test 73.3 (2.37)
    5 9.Mammogram 72.6 (1.82)
    6 6.Colonoscopy 60.2 (1.41)
    7 7. Sigmoidoscopy 6.1 (0.61)
    8 3. Recieved flu vaccine 50.0 (1.33)
    9 8. Blood stool test 14.6 (1.00)
    10 5. Aspirin use-problems 11.7 (1.02)


    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564




    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • David Winsemius at Jun 15, 2016 at 11:16 pm

    On Jun 15, 2016, at 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:

    Hello,

    I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

    Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

    The reproducible example and the output are appended below.

    Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

    Any hints will be appreciated.

    Thanks,

    Pradip Muhuri

    # Reproducible Example

    library("readr")
    testdata <- read_csv(
    "indicator, prevalence
    1. Health check-up, 77.2 (1.19)
    2. Blood cholesterol checked, 84.5 (1.14)
    3. Recieved flu vaccine, 50.0 (1.33)
    4. Blood pressure checked, 88.7 (0.88)
    5. Aspirin use-problems, 11.7 (1.02)
    6.Colonoscopy, 60.2 (1.41)
    7. Sigmoidoscopy, 6.1 (0.61)
    8. Blood stool test, 14.6 (1.00)
    9.Mammogram, 72.6 (1.82)
    10. Pap Smear test, 73.3 (2.37)")

    # Sort on the character variable in descending order
    arrange(testdata, desc(prevalence))

    # Results from Console

    indicator prevalence
    (chr) (chr)
    1 4. Blood pressure checked 88.7 (0.88)
    2 2. Blood cholesterol checked 84.5 (1.14)
    3 1. Health check-up 77.2 (1.19)
    4 10. Pap Smear test 73.3 (2.37)
    5 9.Mammogram 72.6 (1.82)
    6 6.Colonoscopy 60.2 (1.41)
    7 7. Sigmoidoscopy 6.1 (0.61)
    8 3. Recieved flu vaccine 50.0 (1.33)
    9 8. Blood stool test 14.6 (1.00)
    10 5. Aspirin use-problems 11.7 (1.02)

    Despite the fact that the prevalence columns is not really the mixed numeric/alpha , it still can be sorted quite easily with the very handy gtools::mixedorder function:

    require(gtools)
    Loading required package: gtools
    testdata[ mixedorder(testdata$prevalence), ]
    indicator prevalence
    7 7. Sigmoidoscopy 6.1 (0.61)
    5 5. Aspirin use-problems 11.7 (1.02)
    8 8. Blood stool test 14.6 (1.00)
    3 3. Recieved flu vaccine 50.0 (1.33)
    6 6.Colonoscopy 60.2 (1.41)
    9 9.Mammogram 72.6 (1.82)
    10 10. Pap Smear test 73.3 (2.37)
    1 1. Health check-up 77.2 (1.19)
    2 2. Blood cholesterol checked 84.5 (1.14)
    4 4. Blood pressure checked 88.7 (0.88)

    The mixedorder function splits the strings at the space boundaries and tests for numeric or alpha.


    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564

    --


    David Winsemius
    Alameda, CA, USA

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJun 15, '16 at 9:08p
activeJun 15, '16 at 11:16p
posts4
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase