For hierarchical clustering we will use the iris dataset. To make the clustering dendrogram less crowded we will select 30 instances from the iris dataset.
indices <- sample(1:nrow(iris), 30)
dataset <- iris[indices, -5]
To do hierachical clustering we will use the hclust
method:
hc <- hclust(dist(dataset), method="ave")
We have different options to use in the method
argument to define the distance between the clusters as:
single
)complete
)average
)centroid
)?hclust
for a list)To plot the dendrogram and cut the tree into three clusters:
# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)
# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)
hc <- hclust(dist(dataset), method="single")
# Plot the dendrogram
plot(hc, hang = -1, labels=iris$Species[indices])
# cut tree into 3 clusters
rect.hclust(hc, k=3)
# groups contain the distribution of the samples to the clusters/groups
groups <- cutree(hc, k=3)