-----Original Message-----

From: r-help-bounces at r-project.org

[mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius

Sent: Monday, February 21, 2011 3:11 PM

To: David Winsemius

Cc: r-help at r-project.org; IgnacioQM

Subject: Re: [R] How to delete rows with specific values on

all columns(variables)?

On Feb 21, 2011, at 6:05 PM, David Winsemius wrote:On Feb 21, 2011, at 4:03 PM, IgnacioQM wrote:

I need to filter my data:

I think its easy but i'm stuck so i'll appreciate some help:

I have a data frame with 14 variables and 6 million rows. About

half of this

rows have a value of "0" in 12 variables (the other two variables

always

have values). How can I delete the rows in which all 12 variables

have the

value of "0".

example (from my data, variable 14 is missing):

1783 81 85 78 89 71 97 76

66 88

95 95 98 -57.48258

1784 81 86 79 90 71 97 77

66 88

95 95 98 -57.43768

1785 81 86 79 90 71 98 77

66 89

95 94 98 -57.39278

1786 0 0 0 0 0 0 0

0 0

0 0 0 -57.34788

1787 0 0 0 0 0 0 0

0 0

0 0 0 -57.30298

1788 80 86 80 90 72 98 78

66 88

93 93 96 -57.25808

1789 77 83 78 88 70 95 76

63 86

91 90 93 -57.21318

1790 77 84 79 89 70 96 76

64 87

91 90 93 -57.16828

I would need to delete rows 1786 & 1787.

something along the lines of:

dfrm[ -apply(dfrm, 1, function(x) all(x==0) ), ]

Looking at a second time, I see the qualification of only the

first 12

rows, so

dfrm[ -apply(dfrm[, 1:12], 1, function(x) all(x==0) ), ]

I think you want !apply, not -apply, as in

f0 <- function (dfrm) {

dfrm[!apply(dfrm[, 1:12], 1, function(x) all(x == 0)), ]

}

Email obscured that compounded by the fact that you didn't post a

reproducible data object.

A faster and safer way would be to operate a column

at a time (faster when there are many more rows than

colummns) and to avoid apply (safer, as it turns

the data.frame into a matrix whose storage.mode

might surprise you and lead to errors in the x==0 test).

E.g.,

f1 <- function (dfrm) {

isZero <- function(x) !is.na(x) & x == 0

areAllColsZero <- isZero(dfrm[, 1])

for (col in dfrm[, 2:12]) areAllColsZero <- areAllColsZero &

isZero(col)

dfrm[!areAllColsZero, , drop = FALSE]

}

You can use Reduce() instead of the loop, but the loop code

is easy to understand.

I made some fake data with the following function, which

makes all but 12 rows be all-zero:

makeData <- function (nrow) {

rowNum <- seq_len(nrow)

data.frame(lapply(structure(1:12, names = paste("X", 1:12,

sep = "")), function(i) as.integer(rowNum == (2 * i))),

Data1 = 1:nrow, Data2 = sqrt(1:nrow))

}

To test this out:

dfrm <- makeData(1e6) # million rows, 12 to keep

system.time(r0 <- f0(dfrm))

user system elapsed

21.45 0.60 22.55

system.time(r1 <- f1(dfrm)) # faster

user system elapsed

0.87 0.07 0.88

identical(r0, r1) # gives same results [1] TRUE

dim(r0) [1] 12 14

r0

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Data1 Data2

2 1 0 0 0 0 0 0 0 0 0 0 0 2 1.414214

4 0 1 0 0 0 0 0 0 0 0 0 0 4 2.000000

...

Bill Dunlap

Spotfire, TIBCO Software

wdunlap tibco.com

I tried subset with variable1>"0"&variable2>"0", but it wasn't

useful 'cause

it only took the rows that didn't have a 0 in any of the

variables;

I only

need in ALL of the variables simultaneously.

Thanks,

Ignacio

David Winsemius, MD

West Hartford, CT

______________________________________________

R-help at r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.