Grokbase Groups R r-help April 2012
FAQ
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:


Variable 1: Variable2: Variable3: . . .
(1,2) (1,5) (4,2)
(7,8) (3,88) (6,5)
(4,7) (12,4) (4,4)
. . .
. . .
. . .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

Search Discussions

  • David L Carlson at Apr 4, 2012 at 3:59 pm
    You can create distance matrices for each Variable, square them, sum them,
    and take the square root. As for getting the data into a data frame, the
    simplest would be to enter the three variables into six columns like the
    following:

    data
    [,1] [,2] [,3] [,4] [,5] [,6]
    [1,] 1 2 1 5 4 2
    [2,] 7 8 3 88 6 5
    [3,] 4 7 12 4 4 4

    Then use dist() on each pair of columns:

    1:2, 3:4, 5:6 . . .

    e.g. for the 3 rows of data you provided

    size <- nrow(data)*(nrow(data)-1)/2
    dm <- dist(rep(0, size))
    for(i in seq(1, 6, 2)) {
    dm <- dm + dist(data[,i:(i+1)])^2
    }
    dm <- sqrt(dm)
    dm

    ----------------------------------------------
    David L Carlson
    Associate Professor of Anthropology
    Texas A&M University
    College Station, TX 77843-4352



    -----Original Message-----
    From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
    Behalf Of paladini
    Sent: Wednesday, April 04, 2012 6:32 AM
    To: r-help at r-project.org
    Subject: [R] cluster analysis with pairwise data

    Hello,
    I want to do a cluster analysis with my data. The problem is, that the
    variables dont't consist of single value but the entries are pairs of
    values.
    That lokks like this:


    Variable 1: Variable2: Variable3: . . .
    (1,2) (1,5) (4,2)
    (7,8) (3,88) (6,5)
    (4,7) (12,4) (4,4)
    . . .
    . . .
    . . .
    Is it possible to perform a cluster-analysis with this kind of data in
    R ?
    I dont even know how to get this data in a matrix or a dada-frame or
    anything like this.

    It would be really nice if somebody could help me.

    Best regards and happy Easter

    Claudia

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
  • Petr Savicky at Apr 4, 2012 at 4:12 pm

    On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
    Hello,
    I want to do a cluster analysis with my data. The problem is, that the
    variables dont't consist of single value but the entries are pairs of
    values.
    That lokks like this:


    Variable 1: Variable2: Variable3: . . .
    (1,2) (1,5) (4,2)
    (7,8) (3,88) (6,5)
    (4,7) (12,4) (4,4)
    . . .
    . . .
    . . .
    Is it possible to perform a cluster-analysis with this kind of data in
    R ?
    I dont even know how to get this data in a matrix or a dada-frame or
    anything like this.
    Hi.

    The data as they are may be read into R as character data. The
    exact way depends on the format of the data in the file. The
    result may look like the following.

    Var1 <- c("(1,2)", "(7,8)", "(4,7)")
    Var2 <- c("(1,5)", "(3,88)", "(12,4)")
    Var3 <- c("(4,2)", "(6,5)", "(4,4)")
    DF <- data.frame(Var1, Var2, Var3, stringsAsFactorsúLSE)

    If you want to use a distance between pairs depending on the
    numbers (and not only equal/different pair), then the data should
    to be transformed to a numeric format. For example, as follows

    trans <- function(x)
    {
    y <- strsplit(gsub("[()]", "", x), ",")
    unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
    }

    DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
    DF

    Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
    1 1 2 1 5 4 2
    2 7 8 3 88 6 5
    3 4 7 12 4 4 4

    Then, see library(help=cluster).

    Hope this helps.

    Petr Savicky.
  • Ilai at Apr 4, 2012 at 4:59 pm

    On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky wrote:
    On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
    ?Var1 <- c("(1,2)", "(7,8)", "(4,7)")
    ?Var2 <- c("(1,5)", "(3,88)", "(12,4)")
    ?Var3 <- c("(4,2)", "(6,5)", "(4,4)")
    ?DF <- data.frame(Var1, Var2, Var3, stringsAsFactorsúLSE)

    If you want to use a distance between pairs depending on the
    numbers (and not only equal/different pair), then the data should
    to be transformed to a numeric format.
    Or if the pairs have unique meaning ?daisy , also in the cluster
    package, comes in handy (in this case you'll want to keep Vi as
    factors in the call to DF).

    Cheers

    For example, as follows
    ?trans <- function(x)
    ?{
    ? ? ?y <- strsplit(gsub("[()]", "", x), ",")
    ? ? ?unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
    ?}

    ?DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
    ?DF

    ? ?Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
    ?1 ? ? ?1 ? ? ?2 ? ? ?1 ? ? ?5 ? ? ? ?4 ? ? ? ?2
    ?2 ? ? ?7 ? ? ?8 ? ? ?3 ? ? 88 ? ? ? ?6 ? ? ? ?5
    ?3 ? ? ?4 ? ? ?7 ? ? 12 ? ? ?4 ? ? ? ?4 ? ? ? ?4

    Then, see library(help=cluster).

    Hope this helps.

    Petr Savicky.

    ______________________________________________
    R-help at r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedApr 4, '12 at 11:32a
activeApr 4, '12 at 4:59p
posts4
users4
websiter-project.org
irc#r

People

Translate

site design / logo © 2022 Grokbase