Grokbase Groups R r-help January 2012
FAQ
Hello,
I have a big dataset with many variables and I would like to consider
only the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of
hair, colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.

Search Discussions

  • R. Michael Weylandt at Jan 13, 2012 at 3:26 am
    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Marc Schwartz at Jan 13, 2012 at 3:38 am
    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
  • Schreiber, Stefan at Jan 13, 2012 at 3:50 am
    Or with what I just learned:

    subset<-[mydata$age %in% c(20:30),]

    Thanks for explaining Michael!

    Stefan


    -----Original Message-----
    From: [email protected] on behalf of Marc Schwartz
    Sent: Thu 1/12/2012 8:38 PM
    To: R. Michael Weylandt
    Cc: [email protected]; manu79
    Subject: Re: [R] work with a subset of the dataset

    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
    ______________________________________________
    [email protected] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Schreiber, Stefan at Jan 13, 2012 at 3:53 am
    But better without calling your new data frame "subset" since it's a function as well.


    -----Original Message-----
    From: [email protected] on behalf of Schreiber, Stefan
    Sent: Thu 1/12/2012 8:50 PM
    To: Marc Schwartz; R. Michael Weylandt
    Cc: [email protected]; manu79
    Subject: Re: [R] work with a subset of the dataset

    Or with what I just learned:

    subset<-[mydata$age %in% c(20:30),]

    Thanks for explaining Michael!

    Stefan


    -----Original Message-----
    From: [email protected] on behalf of Marc Schwartz
    Sent: Thu 1/12/2012 8:38 PM
    To: R. Michael Weylandt
    Cc: [email protected]; manu79
    Subject: Re: [R] work with a subset of the dataset

    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
    ______________________________________________
    [email protected] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.


    [[alternative HTML version deleted]]

    ______________________________________________
    [email protected] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • R. Michael Weylandt at Jan 13, 2012 at 4:05 am
    Be careful: I think that's only going to check exact equality: i.e.,
    it won't find 20.5, but it also won't find 19.9999999999997 which you
    might get when you mean 20 due to floating point error. If the OP has
    non-integer data, this will cause trouble.

    Michael

    PS -- you also don't need the call to `c` -- there's nothing the 20:30
    sequence is being combined with.

    On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
    wrote:
    Or with what I just learned:

    ?subset<-[mydata$age %in% c(20:30),]

    Thanks for explaining Michael!

    Stefan




    -----Original Message-----
    From: r-help-bounces at r-project.org on behalf of Marc Schwartz
    Sent: Thu 1/12/2012 8:38 PM
    To: R. Michael Weylandt
    Cc: r-help at r-project.org; manu79
    Subject: Re: [R] work with a subset of the dataset

    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    ? NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider
    only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Schreiber, Stefan at Jan 13, 2012 at 4:32 am
    Thanks for the warning !

    Better use Michael's or Marc's suggestion instead.


    Stefan

    -----Original Message-----
    From: R. Michael Weylandt
    Sent: Thu 1/12/2012 9:05 PM
    To: Schreiber, Stefan
    Cc: Marc Schwartz; [email protected]; manu79
    Subject: Re: [R] work with a subset of the dataset

    Be careful: I think that's only going to check exact equality: i.e.,
    it won't find 20.5, but it also won't find 19.9999999999997 which you
    might get when you mean 20 due to floating point error. If the OP has
    non-integer data, this will cause trouble.

    Michael

    PS -- you also don't need the call to `c` -- there's nothing the 20:30
    sequence is being combined with.

    On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
    wrote:
    Or with what I just learned:

    subset<-[mydata$age %in% c(20:30),]

    Thanks for explaining Michael!

    Stefan




    -----Original Message-----
    From: [email protected] on behalf of Marc Schwartz
    Sent: Thu 1/12/2012 8:38 PM
    To: R. Michael Weylandt
    Cc: [email protected]; manu79
    Subject: Re: [R] work with a subset of the dataset

    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider
    only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
    ______________________________________________
    [email protected] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Marc Schwartz at Jan 13, 2012 at 4:51 am
    Just to be precise in language, the use of:

    age %in% 20:30

    is only going to match exact integer values of age from 20 to 30:
    20:30
    [1] 20 21 22 23 24 25 26 27 28 29 30

    That is not the same as matching any value between 20 and 30 as Michael inferred and as our respective examples would do.

    HTH,

    Marc Schwartz
    On Jan 12, 2012, at 10:32 PM, Schreiber, Stefan wrote:

    Thanks for the warning !

    Better use Michael's or Marc's suggestion instead.


    Stefan

    -----Original Message-----
    From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
    Sent: Thu 1/12/2012 9:05 PM
    To: Schreiber, Stefan
    Cc: Marc Schwartz; r-help at r-project.org; manu79
    Subject: Re: [R] work with a subset of the dataset

    Be careful: I think that's only going to check exact equality: i.e.,
    it won't find 20.5, but it also won't find 19.9999999999997 which you
    might get when you mean 20 due to floating point error. If the OP has
    non-integer data, this will cause trouble.

    Michael

    PS -- you also don't need the call to `c` -- there's nothing the 20:30
    sequence is being combined with.

    On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
    wrote:
    Or with what I just learned:

    subset<-[mydata$age %in% c(20:30),]

    Thanks for explaining Michael!

    Stefan




    -----Original Message-----
    From: r-help-bounces at r-project.org on behalf of Marc Schwartz
    Sent: Thu 1/12/2012 8:38 PM
    To: R. Michael Weylandt
    Cc: r-help at r-project.org; manu79
    Subject: Re: [R] work with a subset of the dataset

    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider
    only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.
  • R. Michael Weylandt at Jan 13, 2012 at 4:08 am
    Almost certainly ok here -- I have just seen too many instances where
    the non-standard evaluation of subset() tripped someone up in a
    programming context and figured it was better to get going in the `[`
    direction now rather than introducing subset() into the OP's workflow.

    Best,

    Michael
    On Thu, Jan 12, 2012 at 10:38 PM, Marc Schwartz wrote:
    Presuming that 'DF' is the data frame, I am not sure what is wrong with

    ?NewDF <- subset(DF, (age >= 20) & (age <= 30))

    presuming that 20 and 30 are to be included.

    ?

    Marc Schwartz
    On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

    You can probably do it more easily with the subset() function but in
    my experience that often leads to more problems than solutions:
    perhaps try this.

    idx <- with(DATA, which(age > 20 & age < 30))
    DATA[idx, ]

    Michael
    On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
    Hello,
    I have a big dataset with many variables and I would like to consider only
    the rows in which there is a specific value of a variable.

    I make an example for explain what I mean:
    I have 5 variables describing a person: age, sex, weight, colour of hair,
    colour of eyes.
    I have 1000 rows (1000 persons) and I want to consider only the persons
    whose age is between 20 to 30. How can I do?

    Thank you very much
    M.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJan 12, '12 at 10:25p
activeJan 13, '12 at 4:51a
posts9
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2023 Grokbase