Grokbase Groups R r-help June 2016
FAQ
Hi Sashi,
Since I do not want to create a large fake data set and then
painstakingly test and debug your code, why not try your code with a
subset of the data, maybe only 400 rows. If that runs slowly, your
code is very inefficient (it looks as though it is). You can then
begin to identify where the efficiency of the code can be improved.


Jim



On Tue, Jun 14, 2016 at 10:41 PM, SHASHI SETH wrote:
Dear Jim,

Thanks for ur suggesion. Earlier problem is solved with ur advise. My code
is taking too long to
execute, more than 30 hours. there are 40309 rows and 26952 columns. file
size is 110 MB.Please guide
me what is wrong.

Shashi
On Thu, 09 Jun 2016 14:27:17 +0530 Jim Lemon wrote
Hi Shashi,
Without trying to go through all that code, your error is something

simple. When you read in "matrixdata" right in the beginning, you are

getting a data frame, not a vector or a matrix (which in some cases

can be treated like a vector). That will cause trouble at some point.

Another thing is that when you call this:



if((sum > 0 && sums1 > 0 && sums2 > 0) != NA)



you seem to be asking for the union of three multi-valued vectors (?)

which will probably cause at least a warning, but the error suggests

that at least one of these objects has an NA value somewhere. This

might be because "dtm_500_1.CSV" (whatever that is) has NA values in

it. The code is fairly obscure and I can only say that your best bet

is probably to check the initial data frame for NA values and then

print out the results of each step, or least



cat(sum(is.na(x)),"\n")



where x is the object you have just created. That should allow you to

find where in the tangle of code the NAs are appearing.



Jim






On Thu, Jun 9, 2016 at 4:49 PM, SHASHI SETH wrote:

Hi Jim,

I am getting the following error:
Error in if ((sum > 0 && sums1 > 0 && sums2 > 0) != NA) { :
missing value where TRUE/FALSE needed


I have including my code below for your review:

fitness_1_data <- c();

src="dtm_500_1.CSV"
matrixdata <- read.csv(src)

#get no vector/column from file/matrix
noofvec <- length(matrixdata)

#set no of records/rows/document
noofrecords <- length(matrixdata[,1])

#set row index
rindex<-1;

#preapare header
colindex<-1;
colList <- colnames(matrixdata)

combine<-"";

vec_fitness_data<- c();

while(colindex <= length(colList))
{
fitness_1_data <- append(fitness_1_data,colList[colindex])

colindex<- colindex+1
}

#add two additional vector for percentage and cluster
fitness_1_data <- append(fitness_1_data,"percentage")
fitness_1_data <- append(fitness_1_data,"Cluster")

write.table(as.list(fitness_1_data), file ="Result_500_cycle1.csv",append
TRUE,
row.namesúLSE, col.namesúLSE, sep=",")

#end header record

nestedloopindex <- 2


while( nestedloopindex <= noofrecords )
{

#init of temperory variables
sums1 <- 0;
sums2 <- 0;
sum <- 0;

#set initial index of column 2 ,coloumn one hold document no not
actual data
colindex <- 2;

# combine <-"";

vec1 <- c();
vec2 <- c();

#add document number in vector
vec1 <- append(vec1,matrixdata[rindex,1]);
vec2 <- append(vec2,matrixdata[nestedloopindex,1]);

#declaration of temp -out variable for calculation
#out <- 0;


while(colindex <= noofvec )
{


vec1 <- append(vec1,matrixdata[rindex,colindex]);
vec2 <- append(vec2,matrixdata[nestedloopindex,colindex]);

sum = sum +
matrixdata[rindex,colindex]*matrixdata[nestedloopindex,colindex]

sums1 <- sums1 + matrixdata[rindex,colindex]^2;

sums2 <- sums2 + matrixdata[nestedloopindex,colindex]^2;

colindex <- colindex+1
}

if((sum > 0 && sums1 > 0 && sums2 > 0) != NA)
{

out <- sum / ((sqrt(sums1) * sqrt(sums2)))
}else
{
out <-0
}

vec1 <- append(vec1,out);
vec1 <-append(vec1, "1")
vec2 <- append(vec2, out);



if(nestedloopindex==2)
{
write.table(as.list(vec1), file ="Result_500_cycle1.csv",append >
TRUE, row.namesúLSE, col.namesúLSE, sep=",")
write.table(as.list(vec2), file ="Result_500_cycle1.csv",append >
TRUE, row.namesúLSE, col.namesúLSE, sep=",")
nestedloopindex<- nestedloopindex+1
} else
{
write.table(as.list(vec2), file ="Result_500_cycle1.csv",append >
TRUE, row.namesúLSE, col.namesúLSE, sep=",")
nestedloopindex<- nestedloopindex+1
}

}


With Best Regards,
Shashi

On Thu, 09 Jun 2016 04:45:09 +0530 Jim Lemon wrote
Hi John,

With due respect to the other respondents, here is something that might
help:



# get a vector of values

foo<-rnorm(100)

# get a vector of increasing indices (aka your "recent" values)

bar<-sort(sample(1:100,40))

# write a function to "clump" the adjacent index values

clump_adj_int<-function(x) {

index_list<-list(x[1])

list_index<-1

for(i in 2:length(x)) {

if(x[i]==x[i-1]+1)

index_list[[list_index]]<-c(index_list[[list_index]],x[i])

else {

list_index<-list_index+1

index_list[[list_index]]<-x[i]

}

}

return(index_list)

}

index_clumps<-clump_adj_int(bar)

# write another function to sum the values

sum_subsets<-function(indices,vector)
return(sum(vector[indices],na.rm=TRUE))

# now "apply" the function to the list of indices

lapply(index_clumps,sum_subsets,foo)



Jim





On Thu, Jun 9, 2016 at 2:41 AM, John Logsdon

wrote:

Folks

Is there any way to get the row index into apply as a variable?

I want a function to do some sums on a small subset of some very long
vectors, rolling through the whole vectors.

apply(X,1,function {do something}, other arguments)

seems to be the way to do it.

The subset I want is the most recent set of measurements only - perhaps a
couple of hundred out of millions - but I can't see how to index each
value. The ultimate output should be a matrix of results the length of
the input vector. But to do the sum I need to access the current row
number.

It is easy in a loop but that will take ages. Is there any vectorised
apply-like solution to this?

Or does apply etc only operate on each row at a time, independently of
other rows?


Best wishes

John

John Logsdon
Quantex Research Ltd
+44 161 445 4951/+44 7717758675

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________

R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Search Discussions

  • Jim Lemon at Jun 15, 2016 at 4:12 am
    I'm still unsure of what you are attempting to do with this data.
    First, it is very sparse, appearing to be the counts of occurrences of
    2567 strings, some of which are recognizable English words. I suspect
    that you are trying to get something very simple like the frequency of
    these strings within whatever corpus they inhabit. The code you sent
    does some manipulations I can understand, others seem to be redundant
    or even discarded after they are performed. For instance, you write
    the result file twice, line by line. You also try to access the
    element "matrixdata$ID" when as far as I can see, it doesn't exist.
    That would certainly stop the script. Without knowing what is supposed
    to be the result of this, it is impossible to even analyze code that
    runs (for quite a few minutes) and does not appear to produce any
    output..


    Jim

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJun 14, '16 at 11:41p
activeJun 15, '16 at 4:12a
posts2
users1
websiter-project.org
irc#r

1 user in discussion

Jim Lemon: 2 posts

People

Translate

site design / logo © 2017 Grokbase