Grokbase Groups R r-help August 2012
FAQ
Hello everyone,

I have two sets of data, with the following structure:

DataSet1
Location Part Sample 1 Sample 2
A 1 value value
A 2 value value
A 3 value value
B 1 value value

DataSet2
Location Sample 1 Sample 2
A value value
B value value
C value value

I would like to look at the correlations between DataSet1 and DataSet2, such
that each row in Location A from DataSet1 is paired with the Location A row
from DataSet2, and so forth. So far, my only ideas involve trying to
copy-paste each of the rows in DataSet2 the number of times each occurs in
DataSet1 on a spreadsheet before loading the sets into R; however, as I have
approaching 8000 rows in DataSet2, this is clearly not a workable solution!

I'm sure there's a simple solution to this, so I'm sorry if this seems like
a really silly question.

Thanks for your help!

Jen



--
View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
Sent from the R help mailing list archive at Nabble.com.

Search Discussions

  • R. Michael Weylandt at Aug 9, 2012 at 4:28 pm
    Perhaps load them both and ?merge can show you the way.

    Michael
    On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
    Hello everyone,

    I have two sets of data, with the following structure:

    DataSet1
    Location Part Sample 1 Sample 2
    A 1 value value
    A 2 value value
    A 3 value value
    B 1 value value

    DataSet2
    Location Sample 1 Sample 2
    A value value
    B value value
    C value value

    I would like to look at the correlations between DataSet1 and DataSet2, such
    that each row in Location A from DataSet1 is paired with the Location A row
    from DataSet2, and so forth. So far, my only ideas involve trying to
    copy-paste each of the rows in DataSet2 the number of times each occurs in
    DataSet1 on a spreadsheet before loading the sets into R; however, as I have
    approaching 8000 rows in DataSet2, this is clearly not a workable solution!

    I'm sure there's a simple solution to this, so I'm sorry if this seems like
    a really silly question.

    Thanks for your help!

    Jen



    --
    View this message in context: http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
    Sent from the R help mailing list archive at Nabble.com.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • R. Michael Weylandt at Aug 9, 2012 at 5:57 pm
    Hi Jen,

    It's generally best to keep cc'ing R-help so others can lend a hind
    when I step away from my computer:
    On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs wrote:
    Hi Michael -

    thanks for the advice - I did find merge() just after posting but I'm having
    difficulty with using it. I've loaded both datasets; then I tried
    CombinedData<-merge(MethyData1,ExprData1)
    but when I looked at CombinedData, I found there was no actual data in it:
    str(CombinedData)
    'data.frame': 0 obs. of 20 variables
    Take a look at

    ?merge.data.frame

    in particular since there are many different forms of merges. Your
    original post suggests you may want to set

    all = TRUE
    by = "Location"

    Hope that helps,
    Michael


    I thought this might be due to the fact that my column names, as well as the
    row names, in both data sets were the same, so I renamed the column names in
    ExprData1 and tried again:
    colnames(ExprData1)<-NewExprNames
    merge(ExprData1,MethyData1)
    Error: cannot allocate vector of size 4.2 Gb
    In addition: Warning messages:
    1: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    2: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    3: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    4: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)

    I was surprised about this, as I'm using a 64-bit computer and it's managed
    You'll also need to be using a 64 bit build of R. Merging is pretty
    memory expensive so if you're right on the edge of what R can handle
    you might have to look into a more specialized solution (such as an
    SQL backend)
    to deal with much larger data sets before now (I know that's not the only
    criterion, but my understanding of computers isn't extensive). I had
    previously run up against a memory problem because I hadn't transformed my
    data (I thought I was looking at columns, the computer was looking at rows)
    so I tried transforming both data sets and merging again, but I end up with
    another empty data frame:
    tED1<-t(ExprData1)
    tMD1<-t(MethyData1)
    CombineData<-merge(tED1,tMD1)
    str(CombineData)
    'data.frame': 0 obs. of 152247 variables:

    This is where I'm stuck. Any advice would be hugely appreciated!

    Jen

    On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
    wrote:
    Perhaps load them both and ?merge can show you the way.

    Michael
    On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
    Hello everyone,

    I have two sets of data, with the following structure:

    DataSet1
    Location Part Sample 1 Sample 2
    A 1 value value
    A 2 value value
    A 3 value value
    B 1 value value

    DataSet2
    Location Sample 1 Sample 2
    A value value
    B value value
    C value value

    I would like to look at the correlations between DataSet1 and DataSet2,
    such
    that each row in Location A from DataSet1 is paired with the Location A
    row
    from DataSet2, and so forth. So far, my only ideas involve trying to
    copy-paste each of the rows in DataSet2 the number of times each occurs
    in
    DataSet1 on a spreadsheet before loading the sets into R; however, as I
    have
    approaching 8000 rows in DataSet2, this is clearly not a workable
    solution!

    I'm sure there's a simple solution to this, so I'm sorry if this seems
    like
    a really silly question.

    Thanks for your help!

    Jen



    --
    View this message in context:
    http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
    Sent from the R help mailing list archive at Nabble.com.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Jennifer Hobbs at Aug 10, 2012 at 9:27 am
    HI Michael,

    sorry, didn't mean to hit "reply" rather than "reply all". Thanks for your
    advice, will try that!

    Jen
    On Thu, Aug 9, 2012 at 6:57 PM, R. Michael Weylandt wrote:

    Hi Jen,

    It's generally best to keep cc'ing R-help so others can lend a hind
    when I step away from my computer:
    On Thu, Aug 9, 2012 at 11:49 AM, Jennifer Hobbs wrote:
    Hi Michael -

    thanks for the advice - I did find merge() just after posting but I'm having
    difficulty with using it. I've loaded both datasets; then I tried
    CombinedData<-merge(MethyData1,ExprData1)
    but when I looked at CombinedData, I found there was no actual data in
    it:
    str(CombinedData)
    'data.frame': 0 obs. of 20 variables
    Take a look at

    ?merge.data.frame

    in particular since there are many different forms of merges. Your
    original post suggests you may want to set

    all = TRUE
    by = "Location"

    Hope that helps,
    Michael


    I thought this might be due to the fact that my column names, as well as the
    row names, in both data sets were the same, so I renamed the column names in
    ExprData1 and tried again:
    colnames(ExprData1)<-NewExprNames
    merge(ExprData1,MethyData1)
    Error: cannot allocate vector of size 4.2 Gb
    In addition: Warning messages:
    1: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    2: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    3: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)
    4: In expand.grid(seq_len(nx), seq_len(ny)) :
    Reached total allocation of 8055Mb: see help(memory.size)

    I was surprised about this, as I'm using a 64-bit computer and it's
    managed

    You'll also need to be using a 64 bit build of R. Merging is pretty
    memory expensive so if you're right on the edge of what R can handle
    you might have to look into a more specialized solution (such as an
    SQL backend)
    to deal with much larger data sets before now (I know that's not the only
    criterion, but my understanding of computers isn't extensive). I had
    previously run up against a memory problem because I hadn't transformed my
    data (I thought I was looking at columns, the computer was looking at rows)
    so I tried transforming both data sets and merging again, but I end up with
    another empty data frame:
    tED1<-t(ExprData1)
    tMD1<-t(MethyData1)
    CombineData<-merge(tED1,tMD1)
    str(CombineData)
    'data.frame': 0 obs. of 152247 variables:

    This is where I'm stuck. Any advice would be hugely appreciated!

    Jen

    On Thu, Aug 9, 2012 at 5:28 PM, R. Michael Weylandt
    wrote:
    Perhaps load them both and ?merge can show you the way.

    Michael
    On Thu, Aug 9, 2012 at 9:54 AM, JenniferH wrote:
    Hello everyone,

    I have two sets of data, with the following structure:

    DataSet1
    Location Part Sample 1 Sample 2
    A 1 value value
    A 2 value value
    A 3 value value
    B 1 value value

    DataSet2
    Location Sample 1 Sample 2
    A value value
    B value value
    C value value

    I would like to look at the correlations between DataSet1 and
    DataSet2,
    such
    that each row in Location A from DataSet1 is paired with the Location
    A
    row
    from DataSet2, and so forth. So far, my only ideas involve trying to
    copy-paste each of the rows in DataSet2 the number of times each
    occurs
    in
    DataSet1 on a spreadsheet before loading the sets into R; however, as
    I
    have
    approaching 8000 rows in DataSet2, this is clearly not a workable
    solution!

    I'm sure there's a simple solution to this, so I'm sorry if this seems
    like
    a really silly question.

    Thanks for your help!

    Jen



    --
    View this message in context:
    http://r.789695.n4.nabble.com/correlating-rows-of-two-differently-sized-data-frames-in-R-tp4639774.html
    Sent from the R help mailing list archive at Nabble.com.

    ______________________________________________
    r-help@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedAug 9, '12 at 2:54p
activeAug 10, '12 at 9:27a
posts4
users2
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase