Hierarchical Clustering

The algorithm builds clusters by measuring the dissimilarities between data.
Hierarchical clustering groups data points and visualize the clusters using both a dendrogram and scatter plot
A Dendrogram is a tree-like diagram that records the sequences of merges or splits.

Approaches for Clustering:

The clustering approaches can be broadly divided into two categories: Agglomerative and Divisive.

Agglomerative:

This approach first considers all the points as individual clusters and
then finds out the similarity between two points, puts them into a cluster. – – Then it goes on finding similar points and clusters until there is only one cluster left i.e., all points belong to a big cluster.
This is also called the bottom-up approach.

Divisive:

It is opposite of the agglomerative approach.
It first considers all the points to be part of one big cluster and in the subsequent steps tries to find out the points/ clusters which are least similar to each other and then breaks the bigger cluster into smaller ones.
This continues until there are as many clusters as there are datapoints.
This is also called the top-down approach.

How it works

We will use Agglomerative Clustering that follows a bottom up approach or Agglomerative.
We begin by treating each data point as its own cluster.
Then, we join clusters together that have the shortest distance between them to create larger clusters.
This step is repeated until one large cluster is formed containing all of the data points.
Hierarchical clustering requires us to decide on both a distance and linkage method.
We will use euclidean distance and the Ward linkage method, which attempts to minimize the variance between clusters.

Python Implementation for Hierarchical Clustering

# Import necessary libraries

from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering

# Creating Dendrogram

linkage_data = linkage(X)
dendrogram(linkage_data)

plt.show()

# Creating model

model3 = AgglomerativeClustering(n_clusters = 2,
                                 affinity='euclidean',
                                 linkage='ward')
label3 = model3.fit_predict(X)

# Evaluation

from sklearn.metrics import silhouette_score
score = silhouette_score(X,label3)
print(score)

Output:
0.6867350732769781

Hierarchical Clustering

Hierarchical Clustering

Approaches for Clustering:

How it works

Python Implementation for Hierarchical Clustering

Legal Menu

Tutorial

Hierarchical Clustering

Approaches for Clustering:

How it works

Python Implementation for Hierarchical Clustering

Register

Login here

Forgot your password?

Subscribe to our email list

Legal Menu

Tutorial