**The Curse of Dimensionality:**

Often, data Scientists get datasets which have thousand of features.

These create two kind of problems:

**Increase in computation time:**Majority of the machine learning algorithms they rely on the calculation of

**distance**for model building and as the number of**dimensions increases**it becomes more and more**computation-intensive**to create a model out of it.One more point to consider is that as the number of dimension increases,

**points are going far away**from each other.**Hard (or almost impossible) to visualise the relationship between features:**Humans are bound by their perception of a maximum of three dimensions.

We can’t comprehend shapes/graphs beyond three dimensions.

So, if we have an n-dimensional dataset, the only solution left to us is to create either a 2-D or 3-D graph out of it.

Disadvantages of having more dimensions:

- Training time increases
- Data Visualization becomes difficult
- Computational resources requirement increases
- Chances of overfitting is high
- Difficult to explore the data.

Two ways to remove curse of dimensionality

**Feature Selection-**Drop less important feature**Dimensionality Reduction-**Derive new feature from set of feature which is called feature extractionAmong many algorithm, we will discuss

**PCA**here.

**Principal Component Analysis (PCA)**

The principal component analysis is an

**unsupervised machine learning**algorithm used for feature selection using dimensionality reduction techniques.As the name suggests, it finds out the

**principal components**from the data.PCA transforms and fits the data from a higher-dimensional space to a new, lower-dimensional subspace

This results into an entirely new coordinate system of the points where the first axis corresponds to the first principal component that explains the most variance in the data.

The PCA algorithm is based on some mathematical concepts such as:

**Variance and Covariance****Eigenvalues and Eigen factors**

**What are the principal components?**

Principal components are the

**derived features**which explain the**maximum variance**in the data.The first principal component explains the most variance, the 2nd a bit less and so on.

Each of the new dimensions found using PCA is a linear combination of the original features.

**Some common terms used in PCA algorithm:**

**Dimensionality:**It is the number of features or variables present in the given dataset. More easily, it is the number of columns present in the dataset.**Correlation:**It signifies that how strongly two variables are related to each other. Such as if one changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly proportional to each other.**Orthogonal:**It defines that variables are not correlated to each other, and hence the correlation between the pair of variables is zero.**Eigenvectors:**If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector if Av is the scalar multiple of v.

**Covariance Matrix:**A matrix containing the covariance between the pair of variables is called the Covariance Matrix.

**Explained Variance Ratio**

It represents the

**amount of variance**each principal component is able to explain.The total variance is the sum of variances of all individual principal components.

The fraction of variance explained by a principal component is the ratio between the variance of that principal component and the total variance.

For example,

Variance of PC1 is 50 and

Variance of PC2 is 5.

So the total variance is 55.

EVR of PC1= Variance of PC1 / (Totalvariance)=50/55=0.91

EVR of PC1= Varianceof PC2 / (Totalvariance)=5/55=0.09

Thus PC1 explains 91% of the variance of data. Whereas, PC2 only explains 9% of the variance. Hence we can use only PC1 as the input for our model as it explains the majority of the variance.

In a real-life scenario, this problem is solved using the **Scree Plots.**

**Steps involved in PCA:**

**Scaling the data:**PCA tries to get the features with the maximum variance and the variance is high for high magnitude features. So we need to scale the data.**Calculate the covariance:**to understand the variables that are highly correlated.**Calculate eigen vectors and eigen values**(they are computed from covariance).**Eigen vectors**determine the**direction of new feature space**.**Eigen values**determine their**magnitude**ie., the scalar of the respective eigen vectors.For example:

If you have 2 dimensional dataset, there will be 2 eigen vectors and their respective eigen values.

Reason for having the eigen vectors is to use the

**covariance matrix**to understand where in the data, there is more amount of variance.The covariance matrix generally gives the overall variance among all the variables in the data.

**More the variance**denotes**more information**about the data.So eigen vector will tell where in the data, we have maximum variance.

**Compute the Principal Components:**- After identifying eigen vectors and eigen values, sort them in descending order. Highest eigen value is the most siginificant component.
- PCs are the new features that are obtained and they posses most of the useful information that was scattered among the initial variables.
- These PCs are orthogonal to each other ie., the correlation between 2 variables will be zero.

**Reduce the dimensions of the data:**- Eliminate the PCs that have least eigen value.
- They are not important.

**Scree Plots:**

- Scree plots are the graphs that convey how much variance is explained by corresponding Principal components.
- The Scree Plot helps in deciding how many components to retain by identifying the “elbow” in the plot.

**Example:**

- If we see the above plot where the explained variance drops significantly from Component 1 to Component 2, and then the drop becomes smaller and more gradual from Component 3 onwards, the elbow point would likely be at Component 2.
- This means that the first two principal components explain most of the variance, and adding more components has diminishing returns.

**Python Implementation of PCA:**

```
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
```

```
# Load Dataset
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /10_Dimensionality-Reduction/PCA Class/glass.data')
```

```
data.head()
```

Output: index RI Na Mg Al Si K Ca Ba Fe Class 0 1 1.52101 13.64 4.49 1.10 71.78 0.06 8.75 0.0 0.0 1 1 2 1.51761 13.89 3.60 1.36 72.73 0.48 7.83 0.0 0.0 1 2 3 1.51618 13.53 3.55 1.54 72.99 0.39 7.78 0.0 0.0 1 3 4 1.51766 13.21 3.69 1.29 72.61 0.57 8.22 0.0 0.0 1 4 5 1.51742 13.27 3.62 1.24 73.08 0.55 8.07 0.0 0.0 1

```
data.Class.value_counts()
```

Output: Class 2 76 1 70 7 29 3 17 5 13 6 9 Name: count, dtype: int64

EDA – Skipping

**Data Preprocessing**

```
data.isnull().sum()
```

Output: index 0 RI 0 Na 0 Mg 0 Al 0 Si 0 K 0 Ca 0 Ba 0 Fe 0 Class 0 dtype: int64

```
# Creating x and y
x = data.drop(columns=['index','Class'],axis=1)
y = data.Class
```

```
# Splitting training & testing data
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=73)
# Creating model
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(x_train,y_train)
y_predict = lr.predict(x_test)
```

```
from sklearn.metrics import accuracy_score
score1 = accuracy_score(y_test,y_predict)
score1
```

Output: 0.6976744186046512

**Perform PCA**

```
# Scaling down the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
scaled_data = sc.fit_transform(x)
scaled_data
```

Output: array([[ 0.87286765, 0.28495326, 1.25463857, ..., -0.14576634, -0.35287683, -0.5864509 ], [-0.24933347, 0.59181718, 0.63616803, ..., -0.79373376, -0.35287683, -0.5864509 ], [-0.72131806, 0.14993314, 0.60142249, ..., -0.82894938, -0.35287683, -0.5864509 ], ..., [ 0.75404635, 1.16872135, -1.86551055, ..., -0.36410319, 2.95320036, -0.5864509 ], [-0.61239854, 1.19327046, -1.86551055, ..., -0.33593069, 2.81208731, -0.5864509 ], [-0.41436305, 1.00915211, -1.86551055, ..., -0.23732695, 3.01367739, -0.5864509 ]])

```
# Creating new dataframe
new_data = pd.DataFrame(data=scaled_data,columns= x.columns)
new_data.head()
```

Output: RI Na Mg Al Si K Ca Ba Fe 0 0.872868 0.284953 1.254639 -0.692442 -1.127082 -0.671705 -0.145766 -0.352877 -0.586451 1 -0.249333 0.591817 0.636168 -0.170460 0.102319 -0.026213 -0.793734 -0.352877 -0.586451 2 -0.721318 0.149933 0.601422 0.190912 0.438787 -0.164533 -0.828949 -0.352877 -0.586451 3 -0.232831 -0.242853 0.698710 -0.310994 -0.052974 0.112107 -0.519052 -0.352877 -0.586451 4 -0.312045 -0.169205 0.650066 -0.411375 0.555256 0.081369 -0.624699 -0.352877 -0.586451

```
# Getting the optimal number of PCA
from sklearn.decomposition import PCA
pca = PCA()
pca.fit_transform(new_data)
pca.explained_variance_ratio_
```

Output: array([2.79018192e-01, 2.27785798e-01, 1.56093777e-01, 1.28651383e-01, 1.01555805e-01, 5.86261325e-02, 4.09953826e-02, 7.09477197e-03, 1.78757536e-04])

```
# Scree plot
plt.figure()
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Variance (%)') #for each component
plt.title('Explained Variance')
```

From the diagram, it can be seen 5 principal components explain almost 90% of the variance of the data .

So instead of giving all features as inputs, we’d only feed these 5 principal components of the data to the machine learning algorithm and we’d obtain a similar result

```
pca = PCA(n_components=5)
final_data = pca.fit_transform(new_data)
df = pd.DataFrame(data=final_data,
columns=['pca1','pca2','pca3','pca4','pca5'])
x1 = df
```

```
# Splitting training & testing data
from sklearn.model_selection import train_test_split
x1_train,x1_test,y_train,y_test = train_test_split(x1,y,test_size=0.2,random_state=73)
# Creating model
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr.fit(x1_train,y_train)
y1_predict = lr.predict(x1_test)
```

```
from sklearn.metrics import accuracy_score
score2 = accuracy_score(y_test,y1_predict)
score2
```

Output: 0.627906976744186