FAQ
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:

Variable 1: Variable2: Variable3: . . .
(1,2) (1,5) (4,2)
(7,8) (3,88) (6,5)
(4,7) (12,4) (4,4)
. . .
. . .
. . .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

## Search Discussions

• at Apr 4, 2012 at 3:59 pm ⇧ You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the
simplest would be to enter the three variables into six columns like the
following:

data
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 1 5 4 2
[2,] 7 8 3 88 6 5
[3,] 4 7 12 4 4 4

Then use dist() on each pair of columns:

1:2, 3:4, 5:6 . . .

e.g. for the 3 rows of data you provided

size <- nrow(data)*(nrow(data)-1)/2
dm <- dist(rep(0, size))
for(i in seq(1, 6, 2)) {
dm <- dm + dist(data[,i:(i+1)])^2
}
dm <- sqrt(dm)
dm

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Sent: Wednesday, April 04, 2012 6:32 AM
To: r-help at r-project.org
Subject: [R] cluster analysis with pairwise data

Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:

Variable 1: Variable2: Variable3: . . .
(1,2) (1,5) (4,2)
(7,8) (3,88) (6,5)
(4,7) (12,4) (4,4)
. . .
. . .
. . .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.
• at Apr 4, 2012 at 4:12 pm ⇧ On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:

Variable 1: Variable2: Variable3: . . .
(1,2) (1,5) (4,2)
(7,8) (3,88) (6,5)
(4,7) (12,4) (4,4)
. . .
. . .
. . .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.
Hi.

The data as they are may be read into R as character data. The
exact way depends on the format of the data in the file. The
result may look like the following.

Var1 <- c("(1,2)", "(7,8)", "(4,7)")
Var2 <- c("(1,5)", "(3,88)", "(12,4)")
Var3 <- c("(4,2)", "(6,5)", "(4,4)")
DF <- data.frame(Var1, Var2, Var3, stringsAsFactorsúLSE)

If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format. For example, as follows

trans <- function(x)
{
y <- strsplit(gsub("[()]", "", x), ",")
unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
}

DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
DF

Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
1 1 2 1 5 4 2
2 7 8 3 88 6 5
3 4 7 12 4 4 4

Then, see library(help=cluster).

Hope this helps.

Petr Savicky.
• at Apr 4, 2012 at 4:59 pm ⇧ On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky wrote:
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
?Var1 <- c("(1,2)", "(7,8)", "(4,7)")
?Var2 <- c("(1,5)", "(3,88)", "(12,4)")
?Var3 <- c("(4,2)", "(6,5)", "(4,4)")
?DF <- data.frame(Var1, Var2, Var3, stringsAsFactorsúLSE)

If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format.
Or if the pairs have unique meaning ?daisy , also in the cluster
package, comes in handy (in this case you'll want to keep Vi as
factors in the call to DF).

Cheers

For example, as follows
?trans <- function(x)
?{
? ? ?y <- strsplit(gsub("[()]", "", x), ",")
? ? ?unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
?}

?DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
?DF

? ?Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
?1 ? ? ?1 ? ? ?2 ? ? ?1 ? ? ?5 ? ? ? ?4 ? ? ? ?2
?2 ? ? ?7 ? ? ?8 ? ? ?3 ? ? 88 ? ? ? ?6 ? ? ? ?5
?3 ? ? ?4 ? ? ?7 ? ? 12 ? ? ?4 ? ? ? ?4 ? ? ? ?4

Then, see library(help=cluster).

Hope this helps.

Petr Savicky.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
and provide commented, minimal, self-contained, reproducible code.

## Related Discussions

Discussion Overview
 group r-help categories r posted Apr 4, '12 at 11:32a active Apr 4, '12 at 4:59p posts 4 users 4 website r-project.org irc #r

### 4 users in discussion

Content

People

Support

Translate

site design / logo © 2022 Grokbase