FAQ
Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location Part Sample 1 Sample 2
A 1 value value
A 2 value value
A 3 value value
B 1 value value

DataSet2
Location Sample 1 Sample 2
A value value
B value value
C value value

I would like to look at the correlations between DataSet1 and DataSet2, such
that each row in Location A from DataSet1 is paired with the Location A row
from DataSet2, and so forth. So far, my only ideas involve trying to
copy-paste each of the rows in DataSet2 the number of times each occurs in
approaching 8000 rows in DataSet2, this is clearly not a workable solution!

I'm sure there's a simple solution to this, so I'm sorry if this seems like
a really silly question.

Jen

--
View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

## Search Discussions

•  at Aug 9, 2012 at 4:28 pm ⇧
Perhaps load them both and ?merge can show you the way.

Michael
On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location Part Sample 1 Sample 2
A 1 value value
A 2 value value
A 3 value value
B 1 value value

DataSet2
Location Sample 1 Sample 2
A value value
B value value
C value value

I would like to look at the correlations between DataSet1 and DataSet2, such
that each row in Location A from DataSet1 is paired with the Location A row
from DataSet2, and so forth. So far, my only ideas involve trying to
copy-paste each of the rows in DataSet2 the number of times each occurs in
approaching 8000 rows in DataSet2, this is clearly not a workable solution!

I'm sure there's a simple solution to this, so I'm sorry if this seems like
a really silly question.

Jen

--
View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
•  at Aug 9, 2012 at 5:57 pm ⇧
Hi Jen,

It's generally best to keep cc'ing R-help so others can lend a hind
when I step away from my computer:
On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs wrote:
Hi Michael -

thanks for the advice - I did find merge() just after posting but I'm having
difficulty with using it. I've loaded both datasets; then I tried
CombinedData<-merge(MethyData1,ExprData1)
but when I looked at CombinedData, I found there was no actual data in it:
str(CombinedData)
'data.frame': 0 obs. of 20 variables
Take a look at

?merge.data.frame

in particular since there are many different forms of merges. Your
original post suggests you may want to set

all = TRUE
by = "Location"

Hope that helps,
Michael

I thought this might be due to the fact that my column names, as well as the
row names, in both data sets were the same, so I renamed the column names in
ExprData1 and tried again:
colnames(ExprData1)<-NewExprNames
merge(ExprData1,MethyData1)
Error: cannot allocate vector of size 4.2 Gb
1: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
2: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
3: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
4: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)

You'll also need to be using a 64 bit build of R. Merging is pretty
memory expensive so if you're right on the edge of what R can handle
you might have to look into a more specialized solution (such as an
SQL backend)
to deal with much larger data sets before now (I know that's not the only
criterion, but my understanding of computers isn't extensive). I had
previously run up against a memory problem because I hadn't transformed my
data (I thought I was looking at columns, the computer was looking at rows)
so I tried transforming both data sets and merging again, but I end up with
another empty data frame:
tED1<-t(ExprData1)
tMD1<-t(MethyData1)
CombineData<-merge(tED1,tMD1)
str(CombineData)
'data.frame': 0 obs. of 152247 variables:

This is where I'm stuck. Any advice would be hugely appreciated!

Jen

On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
wrote:
Perhaps load them both and ?merge can show you the way.

Michael
On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location Part Sample 1 Sample 2
A 1 value value
A 2 value value
A 3 value value
B 1 value value

DataSet2
Location Sample 1 Sample 2
A value value
B value value
C value value

I would like to look at the correlations between DataSet1 and DataSet2,
such
that each row in Location A from DataSet1 is paired with the Location A
row
from DataSet2, and so forth. So far, my only ideas involve trying to
copy-paste each of the rows in DataSet2 the number of times each occurs
in
have
approaching 8000 rows in DataSet2, this is clearly not a workable
solution!

I'm sure there's a simple solution to this, so I'm sorry if this seems
like
a really silly question.

Jen

--
View this message in context:
http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
•  at Aug 10, 2012 at 9:27 am ⇧
HI Michael,

sorry, didn't mean to hit "reply" rather than "reply all". Thanks for your

Jen
On Thu, Aug 9, 2012 at 6:57 PM, R. Michael Weylandt wrote:

Hi Jen,

It's generally best to keep cc'ing R-help so others can lend a hind
when I step away from my computer:
On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs wrote:
Hi Michael -

thanks for the advice - I did find merge() just after posting but I'm having
difficulty with using it. I've loaded both datasets; then I tried
CombinedData<-merge(MethyData1,ExprData1)
but when I looked at CombinedData, I found there was no actual data in
it:
str(CombinedData)
'data.frame': 0 obs. of 20 variables
Take a look at

?merge.data.frame

in particular since there are many different forms of merges. Your
original post suggests you may want to set

all = TRUE
by = "Location"

Hope that helps,
Michael

I thought this might be due to the fact that my column names, as well as the
row names, in both data sets were the same, so I renamed the column names in
ExprData1 and tried again:
colnames(ExprData1)<-NewExprNames
merge(ExprData1,MethyData1)
Error: cannot allocate vector of size 4.2 Gb
1: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
2: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
3: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)
4: In expand.grid(seq_len(nx), seq_len(ny)) :
Reached total allocation of 8055Mb: see help(memory.size)

managed

You'll also need to be using a 64 bit build of R. Merging is pretty
memory expensive so if you're right on the edge of what R can handle
you might have to look into a more specialized solution (such as an
SQL backend)
to deal with much larger data sets before now (I know that's not the only
criterion, but my understanding of computers isn't extensive). I had
previously run up against a memory problem because I hadn't transformed my
data (I thought I was looking at columns, the computer was looking at rows)
so I tried transforming both data sets and merging again, but I end up with
another empty data frame:
tED1<-t(ExprData1)
tMD1<-t(MethyData1)
CombineData<-merge(tED1,tMD1)
str(CombineData)
'data.frame': 0 obs. of 152247 variables:

This is where I'm stuck. Any advice would be hugely appreciated!

Jen

On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
wrote:
Perhaps load them both and ?merge can show you the way.

Michael
On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location Part Sample 1 Sample 2
A 1 value value
A 2 value value
A 3 value value
B 1 value value

DataSet2
Location Sample 1 Sample 2
A value value
B value value
C value value

I would like to look at the correlations between DataSet1 and
DataSet2,
such
that each row in Location A from DataSet1 is paired with the Location
A
row
from DataSet2, and so forth. So far, my only ideas involve trying to
copy-paste each of the rows in DataSet2 the number of times each
occurs
in
I
have
approaching 8000 rows in DataSet2, this is clearly not a workable
solution!

I'm sure there's a simple solution to this, so I'm sorry if this seems
like
a really silly question.

Jen

--
View this message in context:
http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
r-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

## Related Discussions

Discussion Overview
 group r-help categories r posted Aug 9, '12 at 2:54p active Aug 10, '12 at 9:27a posts 4 users 2 website r-project.org irc #r

### 2 users in discussion

Content

People

Support

Translate

site design / logo © 2018 Grokbase