Grokbase Groups R r-help June 2016
FAQ
Hello,


I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)?


1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data.


2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his.


mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ]


3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order.


numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
mydata[order(-numprev), ]




(New)Question for solution 2:


I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail?

mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, mydata$prevalence_c) ]

Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), :
   undefined columns selected


********************
str(mydata)
Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
  $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
  $ subgroup : chr "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ ...
  $ n : num 2117 2127 2124 2135 1027 ...
  $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
  $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
  $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
  $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
  $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
  $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
  $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...


Pradip K. Muhuri, AHRQ/CFACT
  5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564








-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund
Sent: Wednesday, June 15, 2016 6:37 PM
To: r-help at r-project.org
Subject: Re: [R] dplyr's arrange function

On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
Hello,

I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

The reproducible example and the output are appended below.

Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

Any hints will be appreciated.

Thanks,

Pradip Muhuri

# Reproducible Example

library("readr")
testdata <- read_csv(
"indicator, prevalence
1. Health check-up, 77.2 (1.19)
2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,
6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
10. Pap Smear test, 73.3 (2.37)")

# Sort on the character variable in descending order arrange(testdata,
desc(prevalence))

# Results from Console

indicator prevalence
(chr) (chr)
1 4. Blood pressure checked 88.7 (0.88)
2 2. Blood cholesterol checked 84.5 (1.14)
3 1. Health check-up 77.2 (1.19)
4 10. Pap Smear test 73.3 (2.37)
5 9.Mammogram 72.6 (1.82)
6 6.Colonoscopy 60.2 (1.41)
7 7. Sigmoidoscopy 6.1 (0.61)
8 3. Recieved flu vaccine 50.0 (1.33)
9 8. Blood stool test 14.6 (1.00)
10 5. Aspirin use-problems 11.7 (1.02)


Pradip K. Muhuri, AHRQ/CFACT
5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564


The problem is that you are sorting a character variable.

testdata$prevalence
   [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
   [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
>


Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).


Hope this is helpful,


Dan


Daniel Nordlund
Port Townsend, WA USA


______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Search Discussions

  • David Winsemius at Jun 16, 2016 at 4:54 pm

    On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) wrote:

    Hello,

    I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)?

    1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data.

    2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his.

    mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ]

    3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order.

    numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c)," "),"[",1))
    mydata[order(-numprev), ]


    (New)Question for solution 2:

    I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail?
    mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c(mydata$indicator, mydata$prevalence_c) ]



    Try instead just a vector of names for the second argument to "["


      mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
              c("indicator", "prevalence_c") ]

    Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), :
    undefined columns selected

    ********************
    str(mydata)
    Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
    $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
    $ subgroup : chr "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ ...
    $ n : num 2117 2127 2124 2135 1027 ...
    $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
    $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
    $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
    $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
    $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
    $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
    $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...

    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564




    -----Original Message-----
    From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund
    Sent: Wednesday, June 15, 2016 6:37 PM
    To: r-help at r-project.org
    Subject: Re: [R] dplyr's arrange function
    On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
    Hello,

    I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

    Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

    The reproducible example and the output are appended below.

    Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

    Any hints will be appreciated.

    Thanks,

    Pradip Muhuri

    # Reproducible Example

    library("readr")
    testdata <- read_csv(
    "indicator, prevalence
    1. Health check-up, 77.2 (1.19)
    2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
    50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
    use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7. Sigmoidoscopy,
    6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
    10. Pap Smear test, 73.3 (2.37)")

    # Sort on the character variable in descending order arrange(testdata,
    desc(prevalence))

    # Results from Console

    indicator prevalence
    (chr) (chr)
    1 4. Blood pressure checked 88.7 (0.88)
    2 2. Blood cholesterol checked 84.5 (1.14)
    3 1. Health check-up 77.2 (1.19)
    4 10. Pap Smear test 73.3 (2.37)
    5 9.Mammogram 72.6 (1.82)
    6 6.Colonoscopy 60.2 (1.41)
    7 7. Sigmoidoscopy 6.1 (0.61)
    8 3. Recieved flu vaccine 50.0 (1.33)
    9 8. Blood stool test 14.6 (1.00)
    10 5. Aspirin use-problems 11.7 (1.02)


    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564


    The problem is that you are sorting a character variable.
    testdata$prevalence
    [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
    [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
    Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).

    Hope this is helpful,

    Dan

    Daniel Nordlund
    Port Townsend, WA USA

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    David Winsemius
    Alameda, CA, USA
  • Muhuri, Pradip (AHRQ/CFACT) at Jun 16, 2016 at 6:06 pm
    Hello David,


    Your revisions to the earlier code have given me desired results.


    library("gtools")
    mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", "prevalence_c") ]


    Thanks,


    Pradip




    Pradip K. Muhuri, AHRQ/CFACT
      5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564










    -----Original Message-----
    From: David Winsemius [mailto:dwinsemius at comcast.net]
    Sent: Thursday, June 16, 2016 12:54 PM
    To: Muhuri, Pradip (AHRQ/CFACT)
    Cc: r-help at r-project.org
    Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New Question



    On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) wrote:

    Hello,

    I got 3 solutions to my earlier code. Thanks to the contributors. May I bring your attention to a new question below (with respect to David's solution)?

    1) Thanks to Daniel Nordlund for the tips - replacing leading space with a 0 in the data.

    2) Thanks to David Winsemius for his solution with the gtools::mixedorder function. I have added an argument to his.

    mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), ]

    3) Thanks to Jim Lemon's for his solution. I have prepended a minus sign to reverse the order.

    numprev<-as.numeric(sapply(strsplit(trimws(mydata$prevalence_c),"
    "),"[",1)) mydata[order(-numprev), ]


    (New)Question for solution 2:

    I want to keep only 2 variables (say, indicator and prevalence_c) in the output. Where to insert the additional code? Why does the following code fail?
    mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
    c(mydata$indicator, mydata$prevalence_c) ]



    Try instead just a vector of names for the second argument to "["


      mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE),
              c("indicator", "prevalence_c") ]

    Error in `[.data.frame`(mydata, mixedorder(mydata$prevalence_c, decreasing = TRUE), :
    undefined columns selected

    ********************
    str(mydata)
    Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 10 variables:
    $ indicator : chr "1. Health check-up" "2. Blood cholesterol checked " "3. Recieved flu vaccine" "4. Blood pressure checked" ...
    $ subgroup : chr "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ "Both sexes, ages 5 yrs""| __truncated__ ...
    $ n : num 2117 2127 2124 2135 1027 ...
    $ prevalence_c: chr "74.7 (1.20)" "90.3 (0.89)" "51.7 (1.35)" "93.2 (0.70)" ...
    $ prevalence_p: chr "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" ...
    $ sensitivity : chr "87.4 (1.10)" "99.2 (0.27)" "97.0 (0.62)" "99.0 (0.27)" ...
    $ specificity : chr "68.3 (2.80)" "58.2 (3.72)" "93.5 (0.90)" "52.7 (3.90)" ...
    $ ppv : chr "90.4 (0.94)" "92.8 (0.85)" "93.7 (0.87)" "94.3 (0.63)" ...
    $ npv : chr "61.5 (3.00)" "92.8 (2.27)" "96.9 (0.63)" "87.5 (3.27)" ...
    $ kappa : chr "0.536 (0.029)" "0.676 (0.032)" "0.905 (0.011)" "0.626 (0.035)" ...

    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564




    -----Original Message-----
    From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel
    Nordlund
    Sent: Wednesday, June 15, 2016 6:37 PM
    To: r-help at r-project.org
    Subject: Re: [R] dplyr's arrange function
    On 6/15/2016 2:08 PM, Muhuri, Pradip (AHRQ/CFACT) wrote:
    Hello,

    I am using the dplyr's arrange() function to sort one of the many data frames on a character variable (named "prevalence").

    Issue: I am not getting the desired output (line 7 is the problem, which should be the very last line in the sorted data frame) because the sorted field is character, not numeric.

    The reproducible example and the output are appended below.

    Is there any work-around to convert/treat this character variable (named "prevalence" in the data frame below) as numeric before using the arrange() function within the dplyr package?

    Any hints will be appreciated.

    Thanks,

    Pradip Muhuri

    # Reproducible Example

    library("readr")
    testdata <- read_csv(
    "indicator, prevalence
    1. Health check-up, 77.2 (1.19)
    2. Blood cholesterol checked, 84.5 (1.14) 3. Recieved flu vaccine,
    50.0 (1.33) 4. Blood pressure checked, 88.7 (0.88) 5. Aspirin
    use-problems, 11.7 (1.02) 6.Colonoscopy, 60.2 (1.41) 7.
    Sigmoidoscopy,
    6.1 (0.61) 8. Blood stool test, 14.6 (1.00) 9.Mammogram, 72.6 (1.82)
    10. Pap Smear test, 73.3 (2.37)")

    # Sort on the character variable in descending order
    arrange(testdata,
    desc(prevalence))

    # Results from Console

    indicator prevalence
    (chr) (chr)
    1 4. Blood pressure checked 88.7 (0.88)
    2 2. Blood cholesterol checked 84.5 (1.14)
    3 1. Health check-up 77.2 (1.19)
    4 10. Pap Smear test 73.3 (2.37)
    5 9.Mammogram 72.6 (1.82)
    6 6.Colonoscopy 60.2 (1.41)
    7 7. Sigmoidoscopy 6.1 (0.61)
    8 3. Recieved flu vaccine 50.0 (1.33)
    9 8. Blood stool test 14.6 (1.00)
    10 5. Aspirin use-problems 11.7 (1.02)


    Pradip K. Muhuri, AHRQ/CFACT
    5600 Fishers Lane # 7N142A, Rockville, MD 20857
    Tel: 301-427-1564


    The problem is that you are sorting a character variable.
    testdata$prevalence
    [1] "77.2 (1.19)" "84.5 (1.14)" "50.0 (1.33)" "88.7 (0.88)" "11.7 (1.02)"
    [6] "60.2 (1.41)" "6.1 (0.61)" "14.6 (1.00)" "72.6 (1.82)" "73.3 (2.37)"
    Notice that the 7th element is "6.1 (0.61)". The first CHARACTER is a "6", so it is going to sort BEFORE the "50.0 (1.33)" (in descending order). If you want the character value of line 7 to sort last, it would need to be "06.1 (0.61)" or " 6.1 (0.61)" (notice the leading space).

    Hope this is helpful,

    Dan

    Daniel Nordlund
    Port Townsend, WA USA

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

    David Winsemius
    Alameda, CA, USA

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJun 16, '16 at 1:12p
activeJun 16, '16 at 6:06p
posts3
users2
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase