## Hierarchical Clustering

For hierarchical clustering we will use the iris dataset. To make the clustering dendrogram less crowded we will select 30 instances from the iris dataset.

indices <- sample(1:nrow(iris), 30)
dataset <- iris[indices, -5]

To do hierachical clustering we will use the hclust method:

hc <- hclust(dist(dataset), method="ave")

We have different options to use in the method argument to define the distance between the clusters as:

• the distance between the closests points of the clusters (single)
• the distance between the farthests points of the clusters (complete)
• the average distance between all points in the clusters (average)
• the distance between the centroids of the clusters (centroid)
• and others (see ?hclust for a list)

To plot the dendrogram and cut the tree into three clusters:

# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices]) # cut tree into 3 clusters rect.hclust(hc, k=3) # groups contain the distribution of the samples to the clusters/groups groups <- cutree(hc, k=3) ## Using single hc <- hclust(dist(dataset), method="single") # Plot the dendrogram plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)

# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)