Hyper Parameter Tuning

Parameters:

These refer to internal variables that are learned directly from the training data.

Example: In linear regression (y=mx+b), m (slope) and b (intercept) are parameters learned from data.

Hyperparameters:

These are external settings that are set before training manually to control the learning process. Hyperparameters control how the model learns, and choosing the right values can significantly improve accuracy, efficiency, and generalization. So, tuning these hyperparameters are necessary to optimize a machine learning model’s performance.

Example: The learning rate (which controls how fast the model updates m and b) is a hyperparameter that you must set before training.

Hyperparameter Tuning:

It is the process of selecting the best values for a machine learning model’s hyperparameters to improve performance.

Common hyperparameter tuning techniques are :

GridSearchCV
RandomizedSearchCV
Bayesian Optimization
Hyperband (Successive Halving)

GridSearchCV:

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method.
This approach is called GridSearchCV, because it searches for the best set of hyperparameters from a grid of hyperparameter values.

Example:

Consider, we want to set two hyperparameters C and Alpha of the Logistic Regression Classifier model, with different sets of values.

The grid search technique will construct many versions of the model with all possible combinations of hyperparameters and will return the best one.
As in the image below,
for C = [0.1, 0.2, 0.3, 0.4, 0.5] and
Alpha = [0.1, 0.2, 0.3, 0.4].
For a combination of C=0.3 and Alpha=0.2, the performance score comes out to be 0.73(Highest), therefore it is selected.

C	0.5	0.70	0.70	0.70	0.70
	0.4	0.70	0.70	0.70	0.70
	0.3	0.72	0.73	0.71	0.70
	0.2	0.71	0.71	0.70	0.70
	0.1	0.70	0.69	0.69	0.68
		0.1	0.2	0.3	0.4
		Alpha

Parameter of GridSearchCV:

Parameter	Description
estimator	The machine learning model to optimize (e.g., RandomForestClassifier()).
scoring	The evaluation metric (e.g., ‘accuracy’, ‘f1’, ‘roc_auc’).
cv	Number of cross-validation folds (e.g., cv=5 for 5-fold cross-validation).
n_jobs	Number of CPU cores to use (-1 means use all available cores).
verbose	Controls log output (0 = silent, 1 = minimal, 2 = detailed).
refit	If True, retrains the best model on the full dataset after tuning.
return_train_score	If True, also returns training scores.
param_grid	Tries all possible combinations of hyperparameters.

Limitation:

GridSearchCV exhaustively searches all possible combinations of hyperparameters which makes grid search computationally very expensive.

Python Implementation for GridSearchCV:

# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV

# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')

# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:,2:]
y = data['Loan_Status'].map({'Y':1,'N':0})

# Defining Parameter range
param_grid = {'C' : [0.1,5,10,50,60,70],
              'gamma' : [1,0.1,0.01,0.001,0.0001],
              'random_state':(list(range(1,20)))}

# Model creation
from sklearn.svm import SVC
model1  = SVC()

#Tuning Model
tuned_model = GridSearchCV(model1,param_grid,refit = True, verbose = 2,scoring = 'f1',cv=5)
tuned_model.fit(X,y)

#Showing best parameters & corresponding score 
print("Tuned SVM Parameters: {}".format(tuned_model.best_params_))
print("Best score is {}".format(tuned_model.best_score_))

Output:
Tuned SVM Parameters: {'C': 5, 'gamma': 0.1, 'random_state': 1}
Best score is 0.8436087845413335

RandomizedSearchCV:

RandomizedSearchCV solves the limitation of GridSearchCV, as it goes through only a fixed number of combinations from specified distributions.
It randomly moves within the grid to find the best set of hyperparameters.
This approach reduces the computational cost.

Parameter of RandomizedSearchCV:

Parameter	Description
estimator	The machine learning model to optimize (e.g., RandomForestClassifier()).
scoring	The evaluation metric (e.g., ‘accuracy’, ‘f1’, ‘roc_auc’).
cv	Number of cross-validation folds (e.g., cv=5 for 5-fold cross-validation).
n_jobs	Number of CPU cores to use (-1 means use all available cores).
verbose	Controls log output (0 = silent, 1 = minimal, 2 = detailed).
refit	If True, retrains the best model on the full dataset after tuning.
return_train_score	If True, also returns training scores.
param_distributions	Randomly samples hyperparameter values.
n_iter	Defines the number of random hyperparameter combinations to try.
random_state	Controls randomness for reproducibility.

Python Implementation for RandomizedSearchCV:

# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV

# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')

# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:,2:]
y = data['Loan_Status'].map({'Y':1,'N':0})

# Defining Parameter distribution
param_dist = {'C': [0.1, 5, 10, 50, 60, 70],
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'random_state': list(range(1, 20))}

# Model creation
from sklearn.svm import SVC
model1 = SVC()

# Tuning Model using RandomizedSearchCV
random_search = RandomizedSearchCV(model1, param_dist, refit=True, verbose=2, scoring='f1', cv=5, n_iter=10, random_state=42)
random_search.fit(X, y)

# Showing best parameters & corresponding score 
print("Tuned SVM Parameters: {}".format(random_search.best_params_))
print("Best score is {}".format(random_search.best_score_))

Output:
Tuned SVM Parameters: {'random_state': 18, 'gamma': 0.1, 'C': 5}
Best score is 0.8436087845413335

Bayesian Optimization:

It uses probabilistic models to find the best set of parameters for a machine learning model.
Instead of exhaustively testing every possibility (like Grid Search), it learns from past trials and predicts which hyperparameters might work best next.

How it works:

Define the Objective Function
- The function that we want to optimize (e.g., maximize accuracy or minimize RMSE).
- This function takes hyperparameters as input and returns a score.
Use a Probabilistic Model
- Builds a model (e.g., Gaussian Process) to estimate the performance of different hyperparameters.
- It predicts which hyperparameter values are promising based on previous trials(The hyperparameter sets that Optuna has already tested before selecting the next set).
Select the Next Hyperparameters to Test
- It chooses hyperparameters that balance exploration and exploitation:
  - Exploration: Tries new areas (new hyperparameter values).
  - Exploitation: Focuses on areas that worked well before.
Evaluate & Update the Model
- The chosen hyperparameters are used to train the ML model, and its performance is recorded.
- The surrogate model updates itself with the new results.
- This process repeats until we find the best hyperparameters.

Parameter	Description
n_trials	Number of trials (iterations).
n_startup_trials	The number of initial random trials before BO starts using the probabilistic model.
acquisition_function	Determines how to choose the next set of parameters (e.g., Expected Improvement, Upper Confidence Bound).
xi	Controls the trade-off between exploration and exploitation.
random_state	Sets a seed for reproducibility.

Limitation:

Although more efficient than Grid/Random Search, it is not Ideal for High-Dimensional Spaces.
Ideal for deep learning and large datasets where training is slow but the Gaussian Process model needs extra computation.

Python Implementation for Bayesian Optimization:

# Import libraries
import numpy as np
import pandas as pd
import optuna
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')

# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:, 2:]
y = data['Loan_Status'].map({'Y': 1, 'N': 0})

# Objective function for Optuna to optimize the hyperparameters
def objective(trial):
    # Suggest hyperparameters
    C = trial.suggest_loguniform('C', 0.1, 100)
    gamma = trial.suggest_loguniform('gamma', 0.0001, 1)
    random_state = trial.suggest_int('random_state', 1, 20)

    # Create model with suggested hyperparameters
    model = SVC(C=C, gamma=gamma, random_state=random_state)

    # Perform cross-validation and return the score
    score = cross_val_score(model, X, y, cv=5, scoring='f1').mean()
    return score

# Create and optimize the Optuna study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)

# Best parameters found by Optuna
print("Tuned SVM Parameters: {}".format(study.best_params))
print("Best score is {}".format(study.best_value))

Output:
Tuned SVM Parameters: {'C': 39.723066053068294, 'gamma': 0.024448094259763114, 'random_state': 14}
Best score is 0.8435800613044847

Hyperband (Successive Halving):

It is an advanced hyperparameter tuning method that is much faster than Grid Search and Bayesian Optimization.
It works by allocating resources (like training time or dataset size) to different hyperparameter sets and quickly eliminates the worst-performing ones.
Instead of training all models fully, Hyperband trains many models with fewer resources first, then increases resources for the best ones.

How Hyperband Works :

Randomly generate hyperparameter sets (like Random Search).
Assign limited resources (e.g., train with a small dataset or fewer epochs).
Evaluate and eliminate the worst half (only keep the best models).
Increase resources for the remaining models (repeat until one best model remains).

Parameter of Hyperband (Successive Halving):

Parameter	Description
max_iter	Maximum number of iterations (training rounds).
min_iter	Minimum resources assigned per trial.
factor	The rate at which bad trials are eliminated (e.g., halve each round).
resource	Defines the resource used for halving (e.g., epochs, dataset size).
cv	Number of cross-validation folds.

Python Implementation for Hyperband (Successive Halving):

from sklearn.experimental import enable_halving_search_cv  # Enables HalvingGridSearchCV
from sklearn.model_selection import HalvingGridSearchCV

# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')

# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:, 2:]
y = data['Loan_Status'].map({'Y': 1, 'N': 0})

# Define the hyperparameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.0001, 0.001, 0.01, 0.1, 1],
    'random_state': list(range(1, 21))
}

# Create the SVM model
model = SVC()

# Define the Hyperband (Successive Halving) search
tuned_model = HalvingGridSearchCV(
    model,
    param_grid,
    factor=2,  # Reduce half of the trials in each iteration
    min_resources='exhaust',  # Use all resources for final tuning
    cv=5,  # 5-fold cross-validation
    scoring='f1',
    verbose=2
)

# Fit the model
tuned_model.fit(X, y)

# Print the best parameters and best score
print("Tuned SVM Parameters:", tuned_model.best_params_)
print("Best score is:", tuned_model.best_score_)

Output:
Tuned SVM Parameters: {'C': 0.1, 'gamma': 0.001, 'random_state': 15}
Best score is: 0.8044520699053379

Hyperparameter space for different algorithm:

# Define the hyperparameter space for linear regression
param_dist = {
    'fit_intercept': [True, False],
    'normalize': [True, False],
    'copy_X': [True, False]
}

# Define the hyperparameter space for lasso regression
param_dist = {
    'alpha': [0.1, 1.0, 10.0],
    'fit_intercept': [True, False],
    'normalize': [True, False],
    'copy_X': [True, False]
}

# Define the hyperparameter space for ridge regression
param_dist = {
    'alpha': [0.1, 1.0, 10.0],
    'fit_intercept': [True, False],
    'normalize': [True, False],
    'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
}

# Define the hyperparameter space for logistic regression
param_dist = {
    'penalty': ['l1', 'l2'],
    'C': [0.1, 1.0, 10.0],
    'fit_intercept': [True, False],
    'solver': ['liblinear', 'saga'],
    'max_iter': [100, 200, 500]
}

# Define the hyperparameter space for KNN
param_dist = {
    'n_neighbors': [3, 5, 7, 9],
    'weights': ['uniform', 'distance'],
    'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
    'p': [1, 2]
}

# Define the hyperparameter space for SVM
param_dist = {
    'C': [0.1, 1.0, 10.0],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'gamma': ['scale', 'auto'],
    'degree': [2, 3, 4]
}

# Define the hyperparameter space for Naive Bayes
param_dist = {
    'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6]
}

# Define the hyperparameter space for Decision Tree
param_dist = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2', None]
}

# Define the hyperparameter space for Random forest
param_dist = {
    'n_estimators': [100, 200, 300],
    'criterion': ['gini', 'entropy'],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['auto', 'sqrt', 'log2']
}

# Define the hyperparameter space for XGBOOST
param_dist = {
    'learning_rate': [0.1, 0.01, 0.001],
    'max_depth': [3, 5, 7],
    'n_estimators': [100, 200, 300],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'gamma': [0, 1, 5]
}

# Define the hyperparameter space for K Means Clustering
param_dist = {
    'n_clusters': [2, 3, 4, 5],
    'init': ['k-means++', 'random'],
    'n_init': [10, 20, 30],
    'max_iter': [100, 200, 300]
}

# Define the hyperparameter space for DBScan Clustering
param_dist = {
    'eps': [0.1, 0.3, 0.5],
    'min_samples': [2, 5, 10],
    'metric': ['euclidean', 'manhattan', 'chebyshev']
}

 # Define the hyperparameter space for Neural Networks (MLP)
mlp_params = {
    'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 100)],
    'activation': ['relu', 'tanh', 'logistic'],
    'solver': ['adam', 'sgd'],
    'alpha': [0.0001, 0.001, 0.01, 0.1]
}

# Define the hyperparameter space for ANN
param_dist = {
    'hidden_layers': [1, 2, 3],
    'units': [16, 32, 64],
    'activation': ['relu', 'sigmoid'],
    'optimizer': ['adam', 'sgd'],
    'epochs': [10, 20, 30],
    'batch_size': [8, 16, 32]
}

# Define the hyperparameter space for CNN
param_dist = {
    'filters': [16, 32, 64],
    'kernel_size': [(3, 3), (5, 5)],
    'pool_size': [(2, 2), (3, 3)],
    'hidden_units': [64, 128, 256],
    'optimizer': ['adam', 'sgd'],
    'epochs': [10, 20, 30],
    'batch_size': [8, 16, 32]
}

Parameters:

Hyperparameters:

GridSearchCV:

Limitation:

Python Implementation for GridSearchCV:

RandomizedSearchCV:

Python Implementation for RandomizedSearchCV:

Bayesian Optimization:

Python Implementation for Bayesian Optimization:

Hyperband (Successive Halving):

How Hyperband Works :

Python Implementation for Hyperband (Successive Halving):

Hyperparameter space for different algorithm:

Social Profile

Data Driven Fashion

Parameters:

Hyperparameters:

GridSearchCV:

Limitation:

Python Implementation for GridSearchCV:

RandomizedSearchCV:

Python Implementation for RandomizedSearchCV:

Bayesian Optimization:

Python Implementation for Bayesian Optimization:

Hyperband (Successive Halving):

How Hyperband Works :

Python Implementation for Hyperband (Successive Halving):

Hyperparameter space for different algorithm:

Register

Login here

Forgot your password?

Subscribe to our email list

Social Profile

Data Driven Fashion