Hi Bert;

I do appreciate for this. I need check your codes on task2 tomorrow at my

office on the real data as I have difficulty (because a technical issue) to

remote connection. I am sure it will work well.

I am sorry that I was not able to explain my first question. Basically

Values in ref data represent the region of chromosome. I need choose these

regions in map (all regions values in ref data are exist in map data in the

first column -column map$reg). And then summing up the column "map$rate and

count the numbers that gives >0.85. For example, consider the first row in

data ref. They are 29220 and 63933. After sorting the first column in

map then summing column "map$rate" only between 29220 to 63933 in sorted

map and cut off at >0.85. Then count how many rows in sorted map gives

0.85. For example consider there are 38 rows between 29220 in 63933 in sorted

map$reg and only summing first 12 of them gives>0.85. Then my answer is

going to be 12 for 29220 - 63933 in ref.

Thanks I lot for your patience.

Cheers,

Greg

On Sun, Jun 12, 2016 at 10:35 PM, greg holly wrote:

Hi Bert;

I do appreciate for this. I need check your codes on task2 tomorrow at my

office on the real data as I have difficulty (because a technical issue) to

remote connection. I am sure it will work well.

I am sorry that I was not able to explain my first question. Basically

Values in ref data represent the region of chromosome. I need choose these

regions in map (all regions values in ref data are exist in map data in the

first column -column map$reg). And then summing up the column "map$rate and

count the numbers that gives >0.85. For example, consider the first row in

data ref. They are 29220 and 63933. After sorting the first column in

map then summing column "map$rate" only between 29220 to 63933 in

sorted map and cut off at >0.85. Then count how many rows in sorted map

gives >0.85. For example consider there are 38 rows between 29220 in

63933 in sorted map$reg and only summing first 12 of them gives>0.85.

Then my answer is going to be 12 for 29220 - 63933 in ref.

Thanks I lot for your patience.

Cheers,

Greg

On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter wrote:Greg:

I was not able to understand your task 1. Perhaps others can.

My understanding of your task 2 is that for each row of ref, you wish

to find all rows,of map such that the reg values in those rows fall

between the reg1 and reg2 values in ref (inclusive change <= to < if

you don't want the endpoints), and then you want the minimum map$p

values of all those rows. If that is correct, I believe this will do

it (but caution, untested, as you failed to provide data in a

convenient form, e.g. using dput() )

task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)

min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))

If my understanding is incorrect, please ignore both the above and the

following:

The "solution" I have given above seems inefficient, so others may be

able to significantly improve it if you find that it takes too long.

OTOH, my understanding of your specification is that you need to

search for all rows in map data frame that meet the criterion for each

row of ref, and without further information, I don't know how to do

this without just repeating the search 560 times.

Cheers,

Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along

and sticking things into it."

-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sun, Jun 12, 2016 at 1:14 PM, greg holly wrote:Dear all;

I have two data sets, data=map and data=ref). A small part of each data set

are given below. Data map has more than 27 million and data ref has about

560 rows. Basically I need run two different task. My R codes for these

task are given below but they do not work properly.

I sincerely do appreciate your helps.

Regards,

Greg

Task 1)

For example, the first and second columns for row 1 in data ref are 29220

63933. So I need write an R code normally first look the first row in ref

(which they are 29220 and 63933) than summing the column of "map$rate" and

give the number of rows that >0.85. Then do the same for the second,

third....in ref. At the end I would like a table gave below (the results I

need). Please notice the all value specified in ref data file are exist in

map$reg column.

Task2)

Again example, the first and second columns for row 1 in data ref are 29220

63933. So I need write an R code give the minimum map$p for the 29220

-63933 intervals in map file. Than

do the same for the second, third....in ref.

#my attempt for the first question

temp<-map[order(map$reg, map$p),]

count<-1

temp<-unique(temp$reg

for(i in 1:length(ref) {

for(j in 1:length(ref)

{

temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]

& temp[cumsum(temp$rate)

0.70,])

count=count+1

}

}

#my attempt for the second question

temp<-map[order(map$reg, map$p),]

count<-1

temp<-unique(temp$reg

for(i in 1:length(ref) {

for(j in 1:length(ref)

{

temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,])

output<-temp2[temp2$p==min(temp2$p),]

}

}

Data sets

Data= map

reg p rate

10276 0.700 3.867e-18

71608 0.830 4.542e-16

29220 0.430 1.948e-15

99542 0.220 1.084e-15

26441 0.880 9.675e-14

95082 0.090 7.349e-13

36169 0.480 9.715e-13

55572 0.500 9.071e-12

65255 0.300 1.688e-11

51960 0.970 1.163e-10

55652 0.388 3.750e-10

63933 0.250 9.128e-10

35170 0.720 7.355e-09

06491 0.370 1.634e-08

85508 0.470 1.057e-07

86666 0.580 7.862e-07

04758 0.810 9.501e-07

06169 0.440 1.104e-06

63933 0.750 2.624e-06

41838 0.960 8.119e-06

data=ref

reg1 reg2

29220 63933

26441 41838

06169 10276

74806 92643

73732 82451

86042 93502

85508 95082

the results I need

reg1 reg2 n

29220 63933 12

26441 41838 78

06169 10276 125

74806 92643 11

73732 82451 47

86042 93502 98

85508 95082 219

[[alternative HTML version deleted]]

______________________________________________

R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see

https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide

http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]