Hierarchical

Hierarchical Clustering

For hierarchical clustering we will use the iris dataset. To make the clustering dendrogram less crowded we will select 30 instances from the iris dataset.

indices <- sample(1:nrow(iris), 30)
dataset <- iris[indices, -5]

To do hierachical clustering we will use the hclust method:

hc <- hclust(dist(dataset), method="ave")

We have different options to use in the method argument to define the distance between the clusters as:

the distance between the closests points of the clusters (single)
the distance between the farthests points of the clusters (complete)
the average distance between all points in the clusters (average)
the distance between the centroids of the clusters (centroid)
and others (see ?hclust for a list)

To plot the dendrogram and cut the tree into three clusters:

# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)

# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)

Using single

hc <- hclust(dist(dataset), method="single")
# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)

# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)

Hierarchical

Kyriakos Chatzidimitriou (kyrcha@gmail.com)

23 November, 2017

Hierarchical Clustering

Using single