Parameters:
These refer to internal variables that are learned directly from the training data.
Example: In linear regression (y=mx+b), m (slope) and b (intercept) are parameters learned from data.
Hyperparameters:
These are external settings that are set before training manually to control the learning process. Hyperparameters control how the model learns, and choosing the right values can significantly improve accuracy, efficiency, and generalization. So, tuning these hyperparameters are necessary to optimize a machine learning model’s performance.
Example: The learning rate (which controls how fast the model updates m and b) is a hyperparameter that you must set before training.
Hyperparameter Tuning:
It is the process of selecting the best values for a machine learning model’s hyperparameters to improve performance.
Common hyperparameter tuning techniques are :
- GridSearchCV
- RandomizedSearchCV
- Bayesian Optimization
- Hyperband (Successive Halving)
GridSearchCV:
- GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method.
- This approach is called GridSearchCV, because it searches for the best set of hyperparameters from a grid of hyperparameter values.
Example:
Consider, we want to set two hyperparameters C and Alpha of the Logistic Regression Classifier model, with different sets of values.
The grid search technique will construct many versions of the model with all possible combinations of hyperparameters and will return the best one.
As in the image below,
for C = [0.1, 0.2, 0.3, 0.4, 0.5] and
Alpha = [0.1, 0.2, 0.3, 0.4].
For a combination of C=0.3 and Alpha=0.2, the performance score comes out to be 0.73(Highest), therefore it is selected.
C | 0.5 | 0.70 | 0.70 | 0.70 | 0.70 |
0.4 | 0.70 | 0.70 | 0.70 | 0.70 | |
0.3 | 0.72 | 0.73 | 0.71 | 0.70 | |
0.2 | 0.71 | 0.71 | 0.70 | 0.70 | |
0.1 | 0.70 | 0.69 | 0.69 | 0.68 | |
0.1 | 0.2 | 0.3 | 0.4 | ||
Alpha |
Parameter of GridSearchCV:
Parameter | Description |
estimator | The machine learning model to optimize (e.g., RandomForestClassifier()). |
scoring | The evaluation metric (e.g., ‘accuracy’, ‘f1’, ‘roc_auc’). |
cv | Number of cross-validation folds (e.g., cv=5 for 5-fold cross-validation). |
n_jobs | Number of CPU cores to use (-1 means use all available cores). |
verbose | Controls log output (0 = silent, 1 = minimal, 2 = detailed). |
refit | If True, retrains the best model on the full dataset after tuning. |
return_train_score | If True, also returns training scores. |
param_grid | Tries all possible combinations of hyperparameters. |
Limitation:
GridSearchCV exhaustively searches all possible combinations of hyperparameters which makes grid search computationally very expensive.
Python Implementation for GridSearchCV:
# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import GridSearchCV
# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')
# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:,2:]
y = data['Loan_Status'].map({'Y':1,'N':0})
# Defining Parameter range
param_grid = {'C' : [0.1,5,10,50,60,70],
'gamma' : [1,0.1,0.01,0.001,0.0001],
'random_state':(list(range(1,20)))}
# Model creation
from sklearn.svm import SVC
model1 = SVC()
#Tuning Model
tuned_model = GridSearchCV(model1,param_grid,refit = True, verbose = 2,scoring = 'f1',cv=5)
tuned_model.fit(X,y)
#Showing best parameters & corresponding score
print("Tuned SVM Parameters: {}".format(tuned_model.best_params_))
print("Best score is {}".format(tuned_model.best_score_))
Output: Tuned SVM Parameters: {'C': 5, 'gamma': 0.1, 'random_state': 1} Best score is 0.8436087845413335
RandomizedSearchCV:
- RandomizedSearchCV solves the limitation of GridSearchCV, as it goes through only a fixed number of combinations from specified distributions.
- It randomly moves within the grid to find the best set of hyperparameters.
- This approach reduces the computational cost.
Parameter of RandomizedSearchCV:
Parameter | Description |
estimator | The machine learning model to optimize (e.g., RandomForestClassifier()). |
scoring | The evaluation metric (e.g., ‘accuracy’, ‘f1’, ‘roc_auc’). |
cv | Number of cross-validation folds (e.g., cv=5 for 5-fold cross-validation). |
n_jobs | Number of CPU cores to use (-1 means use all available cores). |
verbose | Controls log output (0 = silent, 1 = minimal, 2 = detailed). |
refit | If True, retrains the best model on the full dataset after tuning. |
return_train_score | If True, also returns training scores. |
param_distributions | Randomly samples hyperparameter values. |
n_iter | Defines the number of random hyperparameter combinations to try. |
random_state | Controls randomness for reproducibility. |
Python Implementation for RandomizedSearchCV:
# Import libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import RandomizedSearchCV
# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')
# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:,2:]
y = data['Loan_Status'].map({'Y':1,'N':0})
# Defining Parameter distribution
param_dist = {'C': [0.1, 5, 10, 50, 60, 70],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
'random_state': list(range(1, 20))}
# Model creation
from sklearn.svm import SVC
model1 = SVC()
# Tuning Model using RandomizedSearchCV
random_search = RandomizedSearchCV(model1, param_dist, refit=True, verbose=2, scoring='f1', cv=5, n_iter=10, random_state=42)
random_search.fit(X, y)
# Showing best parameters & corresponding score
print("Tuned SVM Parameters: {}".format(random_search.best_params_))
print("Best score is {}".format(random_search.best_score_))
Output: Tuned SVM Parameters: {'random_state': 18, 'gamma': 0.1, 'C': 5} Best score is 0.8436087845413335
Bayesian Optimization:
- It uses probabilistic models to find the best set of parameters for a machine learning model.
- Instead of exhaustively testing every possibility (like Grid Search), it learns from past trials and predicts which hyperparameters might work best next.
How it works:
- Define the Objective Function
- The function that we want to optimize (e.g., maximize accuracy or minimize RMSE).
- This function takes hyperparameters as input and returns a score.
- Use a Probabilistic Model
- Builds a model (e.g., Gaussian Process) to estimate the performance of different hyperparameters.
- It predicts which hyperparameter values are promising based on previous trials(The hyperparameter sets that Optuna has already tested before selecting the next set).
- Select the Next Hyperparameters to Test
- It chooses hyperparameters that balance exploration and exploitation:
- Exploration: Tries new areas (new hyperparameter values).
- Exploitation: Focuses on areas that worked well before.
- It chooses hyperparameters that balance exploration and exploitation:
- Evaluate & Update the Model
- The chosen hyperparameters are used to train the ML model, and its performance is recorded.
- The surrogate model updates itself with the new results.
- This process repeats until we find the best hyperparameters.
Parameter | Description |
n_trials | Number of trials (iterations). |
n_startup_trials | The number of initial random trials before BO starts using the probabilistic model. |
acquisition_function | Determines how to choose the next set of parameters (e.g., Expected Improvement, Upper Confidence Bound). |
xi | Controls the trade-off between exploration and exploitation. |
random_state | Sets a seed for reproducibility. |
Limitation:
- Although more efficient than Grid/Random Search, it is not Ideal for High-Dimensional Spaces.
- Ideal for deep learning and large datasets where training is slow but the Gaussian Process model needs extra computation.
Python Implementation for Bayesian Optimization:
# Import libraries
import numpy as np
import pandas as pd
import optuna
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')
# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:, 2:]
y = data['Loan_Status'].map({'Y': 1, 'N': 0})
# Objective function for Optuna to optimize the hyperparameters
def objective(trial):
# Suggest hyperparameters
C = trial.suggest_loguniform('C', 0.1, 100)
gamma = trial.suggest_loguniform('gamma', 0.0001, 1)
random_state = trial.suggest_int('random_state', 1, 20)
# Create model with suggested hyperparameters
model = SVC(C=C, gamma=gamma, random_state=random_state)
# Perform cross-validation and return the score
score = cross_val_score(model, X, y, cv=5, scoring='f1').mean()
return score
# Create and optimize the Optuna study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)
# Best parameters found by Optuna
print("Tuned SVM Parameters: {}".format(study.best_params))
print("Best score is {}".format(study.best_value))
Output: Tuned SVM Parameters: {'C': 39.723066053068294, 'gamma': 0.024448094259763114, 'random_state': 14} Best score is 0.8435800613044847
Hyperband (Successive Halving):
- It is an advanced hyperparameter tuning method that is much faster than Grid Search and Bayesian Optimization.
- It works by allocating resources (like training time or dataset size) to different hyperparameter sets and quickly eliminates the worst-performing ones.
- Instead of training all models fully, Hyperband trains many models with fewer resources first, then increases resources for the best ones.
How Hyperband Works :
- Randomly generate hyperparameter sets (like Random Search).
- Assign limited resources (e.g., train with a small dataset or fewer epochs).
- Evaluate and eliminate the worst half (only keep the best models).
- Increase resources for the remaining models (repeat until one best model remains).
Parameter of Hyperband (Successive Halving):
Parameter | Description |
max_iter | Maximum number of iterations (training rounds). |
min_iter | Minimum resources assigned per trial. |
factor | The rate at which bad trials are eliminated (e.g., halve each round). |
resource | Defines the resource used for halving (e.g., epochs, dataset size). |
cv | Number of cross-validation folds. |
Python Implementation for Hyperband (Successive Halving):
from sklearn.experimental import enable_halving_search_cv # Enables HalvingGridSearchCV
from sklearn.model_selection import HalvingGridSearchCV
# Load data
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /05_Support Vector Machines/SVM Class /Preprocessed_data.csv')
# Creating data
data2 = data.drop(['Loan_Status'], axis=1)
X = data2.iloc[:, 2:]
y = data['Loan_Status'].map({'Y': 1, 'N': 0})
# Define the hyperparameter grid
param_grid = {
'C': [0.1, 1, 10, 100],
'gamma': [0.0001, 0.001, 0.01, 0.1, 1],
'random_state': list(range(1, 21))
}
# Create the SVM model
model = SVC()
# Define the Hyperband (Successive Halving) search
tuned_model = HalvingGridSearchCV(
model,
param_grid,
factor=2, # Reduce half of the trials in each iteration
min_resources='exhaust', # Use all resources for final tuning
cv=5, # 5-fold cross-validation
scoring='f1',
verbose=2
)
# Fit the model
tuned_model.fit(X, y)
# Print the best parameters and best score
print("Tuned SVM Parameters:", tuned_model.best_params_)
print("Best score is:", tuned_model.best_score_)
Output: Tuned SVM Parameters: {'C': 0.1, 'gamma': 0.001, 'random_state': 15} Best score is: 0.8044520699053379
Hyperparameter space for different algorithm:
# Define the hyperparameter space for linear regression
param_dist = {
'fit_intercept': [True, False],
'normalize': [True, False],
'copy_X': [True, False]
}
# Define the hyperparameter space for lasso regression
param_dist = {
'alpha': [0.1, 1.0, 10.0],
'fit_intercept': [True, False],
'normalize': [True, False],
'copy_X': [True, False]
}
# Define the hyperparameter space for ridge regression
param_dist = {
'alpha': [0.1, 1.0, 10.0],
'fit_intercept': [True, False],
'normalize': [True, False],
'solver': ['auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga']
}
# Define the hyperparameter space for logistic regression
param_dist = {
'penalty': ['l1', 'l2'],
'C': [0.1, 1.0, 10.0],
'fit_intercept': [True, False],
'solver': ['liblinear', 'saga'],
'max_iter': [100, 200, 500]
}
# Define the hyperparameter space for KNN
param_dist = {
'n_neighbors': [3, 5, 7, 9],
'weights': ['uniform', 'distance'],
'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
'p': [1, 2]
}
# Define the hyperparameter space for SVM
param_dist = {
'C': [0.1, 1.0, 10.0],
'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
'gamma': ['scale', 'auto'],
'degree': [2, 3, 4]
}
# Define the hyperparameter space for Naive Bayes
param_dist = {
'var_smoothing': [1e-9, 1e-8, 1e-7, 1e-6]
}
# Define the hyperparameter space for Decision Tree
param_dist = {
'criterion': ['gini', 'entropy'],
'max_depth': [None, 5, 10, 15],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['auto', 'sqrt', 'log2', None]
}
# Define the hyperparameter space for Random forest
param_dist = {
'n_estimators': [100, 200, 300],
'criterion': ['gini', 'entropy'],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4],
'max_features': ['auto', 'sqrt', 'log2']
}
# Define the hyperparameter space for XGBOOST
param_dist = {
'learning_rate': [0.1, 0.01, 0.001],
'max_depth': [3, 5, 7],
'n_estimators': [100, 200, 300],
'subsample': [0.6, 0.8, 1.0],
'colsample_bytree': [0.6, 0.8, 1.0],
'gamma': [0, 1, 5]
}
# Define the hyperparameter space for K Means Clustering
param_dist = {
'n_clusters': [2, 3, 4, 5],
'init': ['k-means++', 'random'],
'n_init': [10, 20, 30],
'max_iter': [100, 200, 300]
}
# Define the hyperparameter space for DBScan Clustering
param_dist = {
'eps': [0.1, 0.3, 0.5],
'min_samples': [2, 5, 10],
'metric': ['euclidean', 'manhattan', 'chebyshev']
}
# Define the hyperparameter space for Neural Networks (MLP)
mlp_params = {
'hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 100)],
'activation': ['relu', 'tanh', 'logistic'],
'solver': ['adam', 'sgd'],
'alpha': [0.0001, 0.001, 0.01, 0.1]
}
# Define the hyperparameter space for ANN
param_dist = {
'hidden_layers': [1, 2, 3],
'units': [16, 32, 64],
'activation': ['relu', 'sigmoid'],
'optimizer': ['adam', 'sgd'],
'epochs': [10, 20, 30],
'batch_size': [8, 16, 32]
}
# Define the hyperparameter space for CNN
param_dist = {
'filters': [16, 32, 64],
'kernel_size': [(3, 3), (5, 5)],
'pool_size': [(2, 2), (3, 3)],
'hidden_units': [64, 128, 256],
'optimizer': ['adam', 'sgd'],
'epochs': [10, 20, 30],
'batch_size': [8, 16, 32]
}