FAQ
Hello,
I have a big dataset with many variables and I would like to consider
only the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of
hair, colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.

## Search Discussions

•  at Jan 13, 2012 at 3:26 am ⇧
You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Jan 13, 2012 at 3:38 am ⇧
Presuming that 'DF' is the data frame, I am not sure what is wrong with

NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
•  at Jan 13, 2012 at 3:50 am ⇧
Or with what I just learned:

subset<-[mydata\$age %in% c(20:30),]

Thanks for explaining Michael!

Stefan

-----Original Message-----
From: [email protected] on behalf of Marc Schwartz
Sent: Thu 1/12/2012 8:38 PM
To: R. Michael Weylandt
Cc: [email protected]; manu79
Subject: Re: [R] work with a subset of the dataset

Presuming that 'DF' is the data frame, I am not sure what is wrong with

NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Jan 13, 2012 at 3:53 am ⇧
But better without calling your new data frame "subset" since it's a function as well.

-----Original Message-----
From: [email protected] on behalf of Schreiber, Stefan
Sent: Thu 1/12/2012 8:50 PM
To: Marc Schwartz; R. Michael Weylandt
Cc: [email protected]; manu79
Subject: Re: [R] work with a subset of the dataset

Or with what I just learned:

subset<-[mydata\$age %in% c(20:30),]

Thanks for explaining Michael!

Stefan

-----Original Message-----
From: [email protected] on behalf of Marc Schwartz
Sent: Thu 1/12/2012 8:38 PM
To: R. Michael Weylandt
Cc: [email protected]; manu79
Subject: Re: [R] work with a subset of the dataset

Presuming that 'DF' is the data frame, I am not sure what is wrong with

NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Jan 13, 2012 at 4:05 am ⇧
Be careful: I think that's only going to check exact equality: i.e.,
it won't find 20.5, but it also won't find 19.9999999999997 which you
might get when you mean 20 due to floating point error. If the OP has
non-integer data, this will cause trouble.

Michael

PS -- you also don't need the call to `c` -- there's nothing the 20:30
sequence is being combined with.

On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
wrote:
Or with what I just learned:

?subset<-[mydata\$age %in% c(20:30),]

Thanks for explaining Michael!

Stefan

-----Original Message-----
From: r-help-bounces at r-project.org on behalf of Marc Schwartz
Sent: Thu 1/12/2012 8:38 PM
To: R. Michael Weylandt
Cc: r-help at r-project.org; manu79
Subject: Re: [R] work with a subset of the dataset

Presuming that 'DF' is the data frame, I am not sure what is wrong with

? NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider
only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Jan 13, 2012 at 4:32 am ⇧
Thanks for the warning !

Better use Michael's or Marc's suggestion instead.

Stefan

-----Original Message-----
From: R. Michael Weylandt
Sent: Thu 1/12/2012 9:05 PM
To: Schreiber, Stefan
Cc: Marc Schwartz; [email protected]; manu79
Subject: Re: [R] work with a subset of the dataset

Be careful: I think that's only going to check exact equality: i.e.,
it won't find 20.5, but it also won't find 19.9999999999997 which you
might get when you mean 20 due to floating point error. If the OP has
non-integer data, this will cause trouble.

Michael

PS -- you also don't need the call to `c` -- there's nothing the 20:30
sequence is being combined with.

On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
wrote:
Or with what I just learned:

subset<-[mydata\$age %in% c(20:30),]

Thanks for explaining Michael!

Stefan

-----Original Message-----
From: [email protected] on behalf of Marc Schwartz
Sent: Thu 1/12/2012 8:38 PM
To: R. Michael Weylandt
Cc: [email protected]; manu79
Subject: Re: [R] work with a subset of the dataset

Presuming that 'DF' is the data frame, I am not sure what is wrong with

NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider
only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Jan 13, 2012 at 4:51 am ⇧
Just to be precise in language, the use of:

age %in% 20:30

is only going to match exact integer values of age from 20 to 30:
20:30
[1] 20 21 22 23 24 25 26 27 28 29 30

That is not the same as matching any value between 20 and 30 as Michael inferred and as our respective examples would do.

HTH,

Marc Schwartz
On Jan 12, 2012, at 10:32 PM, Schreiber, Stefan wrote:

Thanks for the warning !

Better use Michael's or Marc's suggestion instead.

Stefan

-----Original Message-----
From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
Sent: Thu 1/12/2012 9:05 PM
To: Schreiber, Stefan
Cc: Marc Schwartz; r-help at r-project.org; manu79
Subject: Re: [R] work with a subset of the dataset

Be careful: I think that's only going to check exact equality: i.e.,
it won't find 20.5, but it also won't find 19.9999999999997 which you
might get when you mean 20 due to floating point error. If the OP has
non-integer data, this will cause trouble.

Michael

PS -- you also don't need the call to `c` -- there's nothing the 20:30
sequence is being combined with.

On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
wrote:
Or with what I just learned:

subset<-[mydata\$age %in% c(20:30),]

Thanks for explaining Michael!

Stefan

-----Original Message-----
From: r-help-bounces at r-project.org on behalf of Marc Schwartz
Sent: Thu 1/12/2012 8:38 PM
To: R. Michael Weylandt
Cc: r-help at r-project.org; manu79
Subject: Re: [R] work with a subset of the dataset

Presuming that 'DF' is the data frame, I am not sure what is wrong with

NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider
only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.
•  at Jan 13, 2012 at 4:08 am ⇧
Almost certainly ok here -- I have just seen too many instances where
the non-standard evaluation of subset() tripped someone up in a
programming context and figured it was better to get going in the `[`
direction now rather than introducing subset() into the OP's workflow.

Best,

Michael
On Thu, Jan 12, 2012 at 10:38 PM, Marc Schwartz wrote:
Presuming that 'DF' is the data frame, I am not sure what is wrong with

?NewDF <- subset(DF, (age >= 20) & (age <= 30))

presuming that 20 and 30 are to be included.

?

Marc Schwartz
On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:

You can probably do it more easily with the subset() function but in
my experience that often leads to more problems than solutions:
perhaps try this.

idx <- with(DATA, which(age > 20 & age < 30))
DATA[idx, ]

Michael
On Thu, Jan 12, 2012 at 5:25 PM, manu79 wrote:
Hello,
I have a big dataset with many variables and I would like to consider only
the rows in which there is a specific value of a variable.

I make an example for explain what I mean:
I have 5 variables describing a person: age, sex, weight, colour of hair,
colour of eyes.
I have 1000 rows (1000 persons) and I want to consider only the persons
whose age is between 20 to 30. How can I do?

Thank you very much
M.

## Related Discussions

Discussion Overview
 group r-help categories r posted Jan 12, '12 at 10:25p active Jan 13, '12 at 4:51a posts 9 users 4 website r-project.org irc #r

### 4 users in discussion

Content

People

Support

Translate

site design / logo © 2023 Grokbase