Ridge Lasso Regression
Ridge and lasso regressions are machine learning algorithms with an integrated regularization functionalty.
Built upon the essentials of linear regression with an additional penalty term, they serve as a calibrating tool for preventing overfitting.
Similar to linear regression, ridge and lasso include an additional feature called penalty term that prevents overfitting.
Categorically, ridge and lasso regressions are both regularization methods.
Regularization is a approach particularly effective when our data also suffers from multicollinearity.
Multicollinearity means that the independent variables in the regression model are too correlated to each other.
What is Regularization?
Regularization is a tool that prevents overfitting by including additional information.
We use it in regressions to help the model avoid fixating on irrelevant data too much.
In simple terms, regularization refers to a range of techniques aiming to make your model simpler.
Regularization means deliberately inserting some additional mistake into the model so it’s not overstepping the data points so perfectly.
Types of Regularization
- Lasso regression uses L-1 regularization and
- Ridge regression uses L-2 regularization.
What separates them is the form of the additional information, known as a penalty term, that serves as a regularization component.
L-1 regularization applies an L-1 penalty equal to the absolute value of the magnitude of the coefficients.
- It restricts the size of the coefficients,making some of them equal to zero. Mathematically, the L-1 penalty term is represented by the following formula: ∑ |βj|
L-2 regularization, on the other hand, adds an L-2 penalty equal to the square of the magnitude of the coefficients.
Here, all coefficients are shrunk by the same factor. Their values become closer to zero, but they are never actually zero.
Mathematically, the L-2 penalty term is represented by the following formula: ∑ βj^2
Ridge Regression
Ridge regression is essentially a regularization technique for dealing with overfitted data.
It is a linear regression with an additional penalty term equal to the square of the magnitude of the coefficients.
To define the right relationship between independent and dependent variables with a linear regression, we use a cost function that minimizes the sum of the squared differences between predicted and actual values. In other words, the aim is to find the best possible values for the intercept and the slope in order to obtain the least errors. That’s why it is called “the least-squares cost function” and , looks like this L-1:
∑(Ŷi − Yi)^2Ŷi– predicted values
Yi – actual values
In ridge regression, we don’t want to minimize only the squared error, but also the additional regularization penalty term, controlled by a tuning parameter.
This parameter determines how much bias we’ll add to the model and is most often denoted with lambda & lasso regression utilizes an L-2 regularization λ∑βj2
- λ – a tuning parameter controlling the penalty term
- The higher the values of lambda, the bigger the penalty is. If lambda equals zero, the ridge regression basically represents a regular least-squares regression.
The process of estimating the proper value is most often established with the help of a technique called ‘cross-validation’. Applying an appropriate value for the tuning parameter should:
- prevent multicollinearity and overfitting from occurring
- reduce the model’s complexity
Lasso Regression
In Lasso regression, this penalty term is the sum of the coefficient’s magnitudes squared.
Lasso uses the regularization, L-1: λ∑|βj|
- λ – a tuning parameter
Note:
In Python implementation, Lamda λ is used as Alphas
β is Slope of line
Difference between Lasso & Ridge regression
Conceptually, the two methods have the same goal – to increase the bias and lower the variance in order to prevent overfitting.
The major difference between the two algorithms is that a ridge shrinks the coefficients, so they become closer to zero but never actual zeroes, while a lasso can shrink them all the way to zero.
What the lasso regression does is decrease the values of the irrelevant parameters to zero, so that they don’t participate in the equation.
This way, our model only has variables that are important for the predictions.
Such a process is also known as feature selection as it excludes the irrelevant variables from the equation and leaves us with a subset containing only the useful ones.
A huge benefit of using a lasso regression is that it’s very suitable when dealing with big datasets because it can easily lower the variance in models with many features.
y = β0 + β1 x1 + β2 x2 2 + …. + βn x1 n
⇓
y = β0 + 0 x1 + 0 x2 2 + …. + βn x1 n
⇓
y = β0 + …. + βn x1 n
Python Implementation for Ridge & Lasso regression:
# import necessary libraries
import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import LassoCV
from sklearn.metrics import mean_squared_error
# Load dataset
data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /03_Ridge Lasso Regression/Hitters.csv')
data.head()
Output: AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League Division PutOuts Assists Errors Salary NewLeague 0 293 66 1 30 29 14 1 293 66 1 30 29 14 A E 446 33 20 NaN A 1 315 81 7 24 38 39 14 3449 835 69 321 414 375 N W 632 43 10 475.0 N 2 479 130 18 66 72 76 3 1624 457 63 224 266 263 A W 880 82 14 480.0 A 3 496 141 20 65 78 37 11 5628 1575 225 828 838 354 N E 200 11 3 500.0 N 4 321 87 10 39 42 30 2 396 101 12 48 46 33 N E 805 40 4 91.5 N
data.info()
Output: <class 'pandas.core.frame.DataFrame'> RangeIndex: 322 entries, 0 to 321 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 AtBat 322 non-null int64 1 Hits 322 non-null int64 2 HmRun 322 non-null int64 3 Runs 322 non-null int64 4 RBI 322 non-null int64 5 Walks 322 non-null int64 6 Years 322 non-null int64 7 CAtBat 322 non-null int64 8 CHits 322 non-null int64 9 CHmRun 322 non-null int64 10 CRuns 322 non-null int64 11 CRBI 322 non-null int64 12 CWalks 322 non-null int64 13 League 322 non-null object 14 Division 322 non-null object 15 PutOuts 322 non-null int64 16 Assists 322 non-null int64 17 Errors 322 non-null int64 18 Salary 263 non-null float64 19 NewLeague 322 non-null object dtypes: float64(1), int64(16), object(3) memory usage: 50.4+ KB
Data Preprocessing
# Categorical variables
print('The league types are:', data['League'].unique())
print('The divison types are:', data['Division'].unique())
print('The new league options are:', data['NewLeague'].unique())
Output: The league types are: ['A' 'N'] The divison types are: ['E' 'W'] The new league options are: ['A' 'N']
data = pd.get_dummies(data,columns=['League','Division','NewLeague'],drop_first=True)
data.head()
Output: AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks PutOuts Assists Errors Salary League_N Division_W NewLeague_N 0 293 66 1 30 29 14 1 293 66 1 30 29 14 446 33 20 NaN False False False 1 315 81 7 24 38 39 14 3449 835 69 321 414 375 632 43 10 475.0 True True True 2 479 130 18 66 72 76 3 1624 457 63 224 266 263 880 82 14 480.0 False True False 3 496 141 20 65 78 37 11 5628 1575 225 828 838 354 200 11 3 500.0 True False True 4 321 87 10 39 42 30 2 396 101 12 48 46 33 805 40 4 91.5 True False True
data.isnull().sum()
Output: AtBat 0 Hits 0 HmRun 0 Runs 0 RBI 0 Walks 0 Years 0 CAtBat 0 CHits 0 CHmRun 0 CRuns 0 CRBI 0 CWalks 0 PutOuts 0 Assists 0 Errors 0 Salary 59 League_N 0 Division_W 0 NewLeague_N 0
# drop null value containing rows
data = data.dropna()
# Check for multicolinearity
plt.figure(figsize=(11,9))
sns.heatmap(data.corr(),
vmin = -1,
vmax = 1,
cmap ="GnBu",
annot=True)
plt.show()
- Multicollinearity refers to a situation in statistical modeling, particularly in regression analysis, where two or more independent variables (predictors) are highly correlated with each other.
#Declare the dependent and independent variables
X = data.drop('Salary',axis=1)
y = data.Salary
# Split the data into training and testing parts
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=73)
# Scaling down
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
## Linear regression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
print("Linear Regression coefficients are: ",lin_reg.coef_)
print("Linear Regression y-intercept is: ",lin_reg.intercept_)
Output: Linear Regression coefficients are: [-393.9133737 503.6635275 34.3385829 -160.82335916 -49.28342409 165.32246388 -104.99406951 -268.95609322 -89.06196931 -3.87451708 647.46120345 296.80824598 -271.5478198 34.25494412 67.17580939 -34.43065695 21.37299731 -54.77478818 -2.21279977] Linear Regression y-intercept is: 546.5721764705883
lin_reg_y_pred = lin_reg.predict(X_test)
lin_reg_y_pred
Output: array([ 222.19005661, 873.38205929, 775.14062148, 754.7171447 , 334.54429243, 408.20275897, 772.0529616 , 974.78068097, 578.33777206, 187.25487315, 327.54633339, 112.4797303 , 697.25850018, 540.10298542, 283.38195557, 109.4523349 , 988.77511654, 309.79093052, 489.91079007, 953.43279222, 258.36257327, 367.21471079, 401.02949611, 1680.50694521, 299.00832322, 1030.26599051, 1457.03215968, 61.73696093, 182.25761793, 1247.8104851 , 391.90351814, 198.09059966, 788.51243361, 361.47237491, 1020.10467438, 1120.95318326, 969.58394911, 415.4729219 , 1014.61362114, 205.88642454, 929.89317634, 1783.32339196, 143.6673837 , 1110.35303634, 619.76806037, 494.25453501, 598.19673317, 1214.07705092, 178.26632166, 203.85172085, 1251.68562123, 619.00628062, 301.46289692, 543.19970809, 548.14686181, 212.62990343, 757.26830761, 392.27665414, 153.68187666, 1537.33821804, 668.40321052, 389.95446438, 817.0923949 , 407.87632067, 901.01311468, 806.86579481, 543.88581245, 620.90930307, 556.95627773, 645.22952718, 1675.5374484 , 871.25151814, 468.94286785, 201.00708586, 1031.22260246, 189.55431306, 330.75782323, 685.13460976, 622.32517952, 302.15795816, 288.8104254 , 329.49762328, 293.74583348, 217.16454661, 256.58938112, 666.7791947 , 842.0010206 , 364.8392116 , 1007.85629463, 548.04060911, 463.16998346, 926.20992318, 619.76640319])
lin_comp = pd.DataFrame({'Predicted': lin_reg_y_pred, 'Actual': y_test})
lin_comp
Output: Predicted Actual 275 222.190057 135.0 121 873.382059 200.0 177 775.140621 1300.0 238 754.717145 580.0 251 334.544292 425.0
print("Linear Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lin_reg_y_pred)))
print("Linear Regression Model Training Score: ",lin_reg.score(X_train, y_train))
print("Linear Regression Model Testing Score: ",lin_reg.score(X_test, y_test))
Output: Linear Regression Model RMSE is: 354.1849220196898 Linear Regression Model Training Score: 0.6011799546413507 Linear Regression Model Testing Score: 0.253242681589842
Perform ridge regression
# Cross Validation
cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)
# Initializing the ridge regressor
ridge = RidgeCV(alphas = np.arange(0.1,10,0.1),cv= cv ,scoring = 'neg_mean_absolute_error')
# Fitting the ridge regressor
ridge.fit(X_train,y_train)
ridge_reg_y_pred = ridge.predict(X_test)
print("Ridge tuning parameter:", (ridge.alpha_))
print ("Ridge model coefficients:", (ridge.coef_))
print ("Ridge model intercept:", (ridge.intercept_))
Output: Ridge tuning parameter: 9.9 Ridge model coefficients: [-117.07769463 171.87437152 -37.49273895 -9.68483689 11.10827022 80.22126236 -91.88294859 -8.92215114 104.84749774 101.42600928 121.93640341 99.06795238 -91.56804842 35.52525912 39.07253945 -36.09735751 20.00165719 -58.22366564 -0.27411841] Ridge model intercept: 546.5721764705883
print("Ridge Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, ridge_reg_y_pred)))
print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))
print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))
Output: Ridge Regression Model RMSE is: 331.2307351224517 Ridge Regression Model Training Score: 0.5570347617071545 Ridge Regression Model Testing Score: 0.3468986122864255
Perform lasso regression
lasso = LassoCV(alphas=np.arange(0.1,10,0.1),cv=cv ,tol=1)
# Fitting the lasso regressor
lasso.fit(X_train,y_train)
lasso_reg_y_pred = lasso.predict(X_test)
print("Lasso tuning parameter:", (lasso.alpha_))
print ("Lasso model coefficients:", (lasso.coef_))
print ("Lassso model intercept:", (lasso.intercept_))
Output: Lasso tuning parameter: 0.8 Lasso model coefficients: [118.60882122 61.29348915 -12.90165159 -8.08876617 20.98119598 60.93840301 123.38114873 70.82458489 23.11661398 44.11595949 -23.30516437 4.19798778 -29.09075 27.06294716 -0.31059369 -47.38188738 21.78416346 -68.05616091 5.88532458] Lassso model intercept: 546.5721764705883
print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))
Output: Lasso Regression Model RMSE is: 327.7718980528511 Lasso Regression Model Training Score: 0.4243398686558457 Lasso Regression Model Testing Score: 0.3604672611701014
# Fitting the lasso regressor
lasso.fit(X_train,y_train)
lasso_reg_y_pred = lasso.predict(X_test)
print("Lasso tuning parameter:", (lasso.alpha_))
print ("Lasso model coefficients:", (lasso.coef_))
print ("Lassso model intercept:", (lasso.intercept_))
Output: Lasso tuning parameter: 0.8 Lasso model coefficients: [118.60882122 61.29348915 -12.90165159 -8.08876617 20.98119598 60.93840301 123.38114873 70.82458489 23.11661398 44.11595949 -23.30516437 4.19798778 -29.09075 27.06294716 -0.31059369 -47.38188738 21.78416346 -68.05616091 5.88532458] Lassso model intercept: 546.5721764705883
print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))
Output: Lasso Regression Model RMSE is: 327.7718980528511 Lasso Regression Model Training Score: 0.4243398686558457 Lasso Regression Model Testing Score: 0.3604672611701014
Compare the score
print("Linear Regression Model Training Score: ",lin_reg.score(X_train, y_train))
print("Linear Regression Model Testing Score: ",lin_reg.score(X_test, y_test))
print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))
print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))
Output: Linear Regression Model Training Score: 0.6011799546413507 Linear Regression Model Testing Score: 0.253242681589842 Ridge Regression Model Training Score: 0.5570347617071545 Ridge Regression Model Testing Score: 0.3468986122864255 Lasso Regression Model Training Score: 0.4243398686558457 Lasso Regression Model Testing Score: 0.3604672611701014
print("Linear Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lin_reg_y_pred)))
print("Ridge Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, ridge_reg_y_pred)))
print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))
Output: Linear Regression Model RMSE is: 354.1849220196898 Ridge Regression Model RMSE is: 331.2307351224517 Lasso Regression Model RMSE is: 327.7718980528511