Hierarchical Clustering

For hierarchical clustering we will use the iris dataset. To make the clustering dendrogram less crowded we will select 30 instances from the iris dataset.

indices <- sample(1:nrow(iris), 30)
dataset <- iris[indices, -5]

To do hierachical clustering we will use the hclust method:

hc <- hclust(dist(dataset), method="ave")

We have different options to use in the method argument to define the distance between the clusters as:

To plot the dendrogram and cut the tree into three clusters:

# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)

# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)

Using single

hc <- hclust(dist(dataset), method="single")
# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)

# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)