Hierarchical Clustering

  • The algorithm builds clusters by measuring the dissimilarities between data.

  • Hierarchical clustering groups data points and visualize the clusters using both a dendrogram and scatter plot

  • A Dendrogram is a tree-like diagram that records the sequences of merges or splits.

Approaches for Clustering:

The clustering approaches can be broadly divided into two categories: Agglomerative and Divisive.

Agglomerative:

  • This approach first considers all the points as individual clusters and
  • then finds out the similarity between two points, puts them into a cluster. – – Then it goes on finding similar points and clusters until there is only one cluster left i.e., all points belong to a big cluster.
  • This is also called the bottom-up approach.

Divisive:

  • It is opposite of the agglomerative approach.
  • It first considers all the points to be part of one big cluster and in the subsequent steps tries to find out the points/ clusters which are least similar to each other and then breaks the bigger cluster into smaller ones.
  • This continues until there are as many clusters as there are datapoints.
  • This is also called the top-down approach.

How it works

  • We will use Agglomerative Clustering that follows a bottom up approach or Agglomerative.

  • We begin by treating each data point as its own cluster.

  • Then, we join clusters together that have the shortest distance between them to create larger clusters.

  • This step is repeated until one large cluster is formed containing all of the data points.

  • Hierarchical clustering requires us to decide on both a distance and linkage method.

  • We will use euclidean distance and the Ward linkage method, which attempts to minimize the variance between clusters.

Python Implementation for Hierarchical Clustering

# Import necessary libraries

from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.cluster import AgglomerativeClustering
# Creating Dendrogram

linkage_data = linkage(X)
dendrogram(linkage_data)

plt.show()

# Creating model

model3 = AgglomerativeClustering(n_clusters = 2,
                                 affinity='euclidean',
                                 linkage='ward')
label3 = model3.fit_predict(X)
# Evaluation

from sklearn.metrics import silhouette_score
score = silhouette_score(X,label3)
print(score)
Output:
0.6867350732769781

Register

Login here

Forgot your password?

ads

ads

I am an enthusiastic advocate for the transformative power of data in the fashion realm. Armed with a strong background in data science, I am committed to revolutionizing the industry by unlocking valuable insights, optimizing processes, and fostering a data-centric culture that propels fashion businesses into a successful and forward-thinking future. - Masud Rana, Certified Data Scientist, IABAC

© Data4Fashion 2023-2024

Developed by: Behostweb.com

Please accept cookies
Accept All Cookies