-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
Sent: Monday, February 21, 2011 3:11 PM
To: David Winsemius
Cc: r-help at r-project.org; IgnacioQM
Subject: Re: [R] How to delete rows with specific values on
all columns(variables)?
On Feb 21, 2011, at 6:05 PM, David Winsemius wrote:On Feb 21, 2011, at 4:03 PM, IgnacioQM wrote:
I need to filter my data:
I think its easy but i'm stuck so i'll appreciate some help:
I have a data frame with 14 variables and 6 million rows. About
half of this
rows have a value of "0" in 12 variables (the other two variables
always
have values). How can I delete the rows in which all 12 variables
have the
value of "0".
example (from my data, variable 14 is missing):
1783 81 85 78 89 71 97 76
66 88
95 95 98 -57.48258
1784 81 86 79 90 71 97 77
66 88
95 95 98 -57.43768
1785 81 86 79 90 71 98 77
66 89
95 94 98 -57.39278
1786 0 0 0 0 0 0 0
0 0
0 0 0 -57.34788
1787 0 0 0 0 0 0 0
0 0
0 0 0 -57.30298
1788 80 86 80 90 72 98 78
66 88
93 93 96 -57.25808
1789 77 83 78 88 70 95 76
63 86
91 90 93 -57.21318
1790 77 84 79 89 70 96 76
64 87
91 90 93 -57.16828
I would need to delete rows 1786 & 1787.
something along the lines of:
dfrm[ -apply(dfrm, 1, function(x) all(x==0) ), ]
Looking at a second time, I see the qualification of only the
first 12
rows, so
dfrm[ -apply(dfrm[, 1:12], 1, function(x) all(x==0) ), ]
I think you want !apply, not -apply, as in
f0 <- function (dfrm) {
dfrm[!apply(dfrm[, 1:12], 1, function(x) all(x == 0)), ]
}
Email obscured that compounded by the fact that you didn't post a
reproducible data object.
A faster and safer way would be to operate a column
at a time (faster when there are many more rows than
colummns) and to avoid apply (safer, as it turns
the data.frame into a matrix whose storage.mode
might surprise you and lead to errors in the x==0 test).
E.g.,
f1 <- function (dfrm) {
isZero <- function(x) !is.na(x) & x == 0
areAllColsZero <- isZero(dfrm[, 1])
for (col in dfrm[, 2:12]) areAllColsZero <- areAllColsZero &
isZero(col)
dfrm[!areAllColsZero, , drop = FALSE]
}
You can use Reduce() instead of the loop, but the loop code
is easy to understand.
I made some fake data with the following function, which
makes all but 12 rows be all-zero:
makeData <- function (nrow) {
rowNum <- seq_len(nrow)
data.frame(lapply(structure(1:12, names = paste("X", 1:12,
sep = "")), function(i) as.integer(rowNum == (2 * i))),
Data1 = 1:nrow, Data2 = sqrt(1:nrow))
}
To test this out:
dfrm <- makeData(1e6) # million rows, 12 to keep
system.time(r0 <- f0(dfrm))
user system elapsed
21.45 0.60 22.55
system.time(r1 <- f1(dfrm)) # faster
user system elapsed
0.87 0.07 0.88
identical(r0, r1) # gives same results [1] TRUE
dim(r0) [1] 12 14
r0
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 Data1 Data2
2 1 0 0 0 0 0 0 0 0 0 0 0 2 1.414214
4 0 1 0 0 0 0 0 0 0 0 0 0 4 2.000000
...
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
I tried subset with variable1>"0"&variable2>"0", but it wasn't
useful 'cause
it only took the rows that didn't have a 0 in any of the
variables;
I only
need in ALL of the variables simultaneously.
Thanks,
Ignacio
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide
http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.