Ridge Lasso Regression

Ridge and lasso regressions are machine learning algorithms with an integrated regularization functionalty.
Built upon the essentials of linear regression with an additional penalty term, they serve as a calibrating tool for preventing overfitting.
Similar to linear regression, ridge and lasso include an additional feature called penalty term that prevents overfitting.
Categorically, ridge and lasso regressions are both regularization methods.
Regularization is a approach particularly effective when our data also suffers from multicollinearity.
Multicollinearity means that the independent variables in the regression model are too correlated to each other.

What is Regularization?

Regularization is a tool that prevents overfitting by including additional information.
We use it in regressions to help the model avoid fixating on irrelevant data too much.
In simple terms, regularization refers to a range of techniques aiming to make your model simpler.
Regularization means deliberately inserting some additional mistake into the model so it’s not overstepping the data points so perfectly.

Types of Regularization

Lasso regression uses L-1 regularization and
Ridge regression uses L-2 regularization.

What separates them is the form of the additional information, known as a penalty term, that serves as a regularization component.

L-1 regularization applies an L-1 penalty equal to the absolute value of the magnitude of the coefficients.
- It restricts the size of the coefficients,making some of them equal to zero. Mathematically, the L-1 penalty term is represented by the following formula: ∑ |βj|

L-2 regularization, on the other hand, adds an L-2 penalty equal to the square of the magnitude of the coefficients.
- Here, all coefficients are shrunk by the same factor. Their values become closer to zero, but they are never actually zero.
- Mathematically, the L-2 penalty term is represented by the following formula: ∑ βj^2

Ridge Regression

Ridge regression is essentially a regularization technique for dealing with overfitted data.
It is a linear regression with an additional penalty term equal to the square of the magnitude of the coefficients.
To define the right relationship between independent and dependent variables with a linear regression, we use a cost function that minimizes the sum of the squared differences between predicted and actual values. In other words, the aim is to find the best possible values for the intercept and the slope in order to obtain the least errors. That’s why it is called “the least-squares cost function” and , looks like this L-1:
∑(Ŷi − Yi)^2
- Ŷi– predicted values
- Yi – actual values
In ridge regression, we don’t want to minimize only the squared error, but also the additional regularization penalty term, controlled by a tuning parameter.
This parameter determines how much bias we’ll add to the model and is most often denoted with lambda & lasso regression utilizes an L-2 regularization λ∑βj2
- λ – a tuning parameter controlling the penalty term
- The higher the values of lambda, the bigger the penalty is. If lambda equals zero, the ridge regression basically represents a regular least-squares regression.
The process of estimating the proper value is most often established with the help of a technique called ‘cross-validation’. Applying an appropriate value for the tuning parameter should:
- prevent multicollinearity and overfitting from occurring
- reduce the model’s complexity

Lasso Regression

In Lasso regression, this penalty term is the sum of the coefficient’s magnitudes squared.
Lasso uses the regularization, L-1: λ∑|βj|
- λ – a tuning parameter

Note:

In Python implementation, Lamda λ is used as Alphas
β is Slope of line

Difference between Lasso & Ridge regression

Conceptually, the two methods have the same goal – to increase the bias and lower the variance in order to prevent overfitting.
The major difference between the two algorithms is that a ridge shrinks the coefficients, so they become closer to zero but never actual zeroes, while a lasso can shrink them all the way to zero.
What the lasso regression does is decrease the values of the irrelevant parameters to zero, so that they don’t participate in the equation.
This way, our model only has variables that are important for the predictions.
Such a process is also known as feature selection as it excludes the irrelevant variables from the equation and leaves us with a subset containing only the useful ones.
A huge benefit of using a lasso regression is that it’s very suitable when dealing with big datasets because it can easily lower the variance in models with many features.

y = β₀ + β₁ x₁+ β₂ x₂² + …. + β_n x₁ⁿ

^⇓

y = β₀ + 0 x₁+ 0 x₂² + …. + β_n x₁ⁿ

^⇓

y = β₀ + …. + β_n x₁ⁿ

Python Implementation for Ridge & Lasso regression:

# import necessary libraries

import pandas as pd
import numpy as np
import math
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

from sklearn.preprocessing import StandardScaler 

from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import LassoCV

from sklearn.metrics import mean_squared_error

# Load dataset

data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /03_Ridge Lasso Regression/Hitters.csv')

data.head()

Output:
   AtBat  Hits  HmRun  Runs  RBI  Walks  Years  CAtBat  CHits  CHmRun  CRuns  CRBI  CWalks League Division  PutOuts  Assists  Errors  Salary NewLeague
0    293    66      1    30   29     14      1     293     66       1     30    29      14      A        E      446       33      20     NaN         A
1    315    81      7    24   38     39     14    3449    835      69    321   414     375      N        W      632       43      10   475.0         N
2    479   130     18    66   72     76      3    1624    457      63    224   266     263      A        W      880       82      14   480.0         A
3    496   141     20    65   78     37     11    5628   1575     225    828   838     354      N        E      200       11       3   500.0         N
4    321    87     10    39   42     30      2     396    101      12     48    46      33      N        E      805       40       4    91.5         N

data.info()

Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 322 entries, 0 to 321
Data columns (total 20 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   AtBat      322 non-null    int64  
 1   Hits       322 non-null    int64  
 2   HmRun      322 non-null    int64  
 3   Runs       322 non-null    int64  
 4   RBI        322 non-null    int64  
 5   Walks      322 non-null    int64  
 6   Years      322 non-null    int64  
 7   CAtBat     322 non-null    int64  
 8   CHits      322 non-null    int64  
 9   CHmRun     322 non-null    int64  
 10  CRuns      322 non-null    int64  
 11  CRBI       322 non-null    int64  
 12  CWalks     322 non-null    int64  
 13  League     322 non-null    object 
 14  Division   322 non-null    object 
 15  PutOuts    322 non-null    int64  
 16  Assists    322 non-null    int64  
 17  Errors     322 non-null    int64  
 18  Salary     263 non-null    float64
 19  NewLeague  322 non-null    object 
dtypes: float64(1), int64(16), object(3)
memory usage: 50.4+ KB

Data Preprocessing

# Categorical variables
print('The league types are:', data['League'].unique())
print('The divison types are:', data['Division'].unique())
print('The new league options are:', data['NewLeague'].unique())

Output:
The league types are: ['A' 'N']
The divison types are: ['E' 'W']
The new league options are: ['A' 'N']

data = pd.get_dummies(data,columns=['League','Division','NewLeague'],drop_first=True)
data.head()

Output:
   AtBat  Hits  HmRun  Runs  RBI  Walks  Years  CAtBat  CHits  CHmRun  CRuns  CRBI  CWalks  PutOuts  Assists  Errors  Salary  League_N  Division_W  NewLeague_N
0    293    66      1    30   29     14      1     293     66       1     30    29      14      446       33      20     NaN     False       False        False
1    315    81      7    24   38     39     14    3449    835      69    321   414     375      632       43      10   475.0      True        True         True
2    479   130     18    66   72     76      3    1624    457      63    224   266     263      880       82      14   480.0     False        True        False
3    496   141     20    65   78     37     11    5628   1575     225    828   838     354      200       11       3   500.0      True       False         True
4    321    87     10    39   42     30      2     396    101      12     48    46      33      805       40       4    91.5      True       False         True

data.isnull().sum()

Output:
AtBat           0
Hits            0
HmRun           0
Runs            0
RBI             0
Walks           0
Years           0
CAtBat          0
CHits           0
CHmRun          0
CRuns           0
CRBI            0
CWalks          0
PutOuts         0
Assists         0
Errors          0
Salary         59
League_N        0
Division_W      0
NewLeague_N     0

# drop null value containing rows

data = data.dropna()

# Check for multicolinearity

plt.figure(figsize=(11,9))
sns.heatmap(data.corr(),
            vmin = -1, 
            vmax = 1,
            cmap ="GnBu",
            annot=True)
plt.show()

Multicollinearity refers to a situation in statistical modeling, particularly in regression analysis, where two or more independent variables (predictors) are highly correlated with each other.

#Declare the dependent and independent variables

X = data.drop('Salary',axis=1)
y = data.Salary

# Split the data into training and testing parts
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.35, random_state=73)

# Scaling down

sc = StandardScaler()

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Linear regression

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

print("Linear Regression coefficients are: ",lin_reg.coef_)
print("Linear Regression y-intercept is: ",lin_reg.intercept_)

Output:
Linear Regression coefficients are:  [-393.9133737   503.6635275    34.3385829  -160.82335916  -49.28342409
  165.32246388 -104.99406951 -268.95609322  -89.06196931   -3.87451708
  647.46120345  296.80824598 -271.5478198    34.25494412   67.17580939
  -34.43065695   21.37299731  -54.77478818   -2.21279977]
Linear Regression y-intercept is:  546.5721764705883

lin_reg_y_pred = lin_reg.predict(X_test)
lin_reg_y_pred

Output:
array([ 222.19005661,  873.38205929,  775.14062148,  754.7171447 ,
        334.54429243,  408.20275897,  772.0529616 ,  974.78068097,
        578.33777206,  187.25487315,  327.54633339,  112.4797303 ,
        697.25850018,  540.10298542,  283.38195557,  109.4523349 ,
        988.77511654,  309.79093052,  489.91079007,  953.43279222,
        258.36257327,  367.21471079,  401.02949611, 1680.50694521,
        299.00832322, 1030.26599051, 1457.03215968,   61.73696093,
        182.25761793, 1247.8104851 ,  391.90351814,  198.09059966,
        788.51243361,  361.47237491, 1020.10467438, 1120.95318326,
        969.58394911,  415.4729219 , 1014.61362114,  205.88642454,
        929.89317634, 1783.32339196,  143.6673837 , 1110.35303634,
        619.76806037,  494.25453501,  598.19673317, 1214.07705092,
        178.26632166,  203.85172085, 1251.68562123,  619.00628062,
        301.46289692,  543.19970809,  548.14686181,  212.62990343,
        757.26830761,  392.27665414,  153.68187666, 1537.33821804,
        668.40321052,  389.95446438,  817.0923949 ,  407.87632067,
        901.01311468,  806.86579481,  543.88581245,  620.90930307,
        556.95627773,  645.22952718, 1675.5374484 ,  871.25151814,
        468.94286785,  201.00708586, 1031.22260246,  189.55431306,
        330.75782323,  685.13460976,  622.32517952,  302.15795816,
        288.8104254 ,  329.49762328,  293.74583348,  217.16454661,
        256.58938112,  666.7791947 ,  842.0010206 ,  364.8392116 ,
       1007.85629463,  548.04060911,  463.16998346,  926.20992318,
        619.76640319])

lin_comp = pd.DataFrame({'Predicted': lin_reg_y_pred, 'Actual': y_test})
lin_comp

Output:
      Predicted  Actual
275  222.190057   135.0
121  873.382059   200.0
177  775.140621  1300.0
238  754.717145   580.0
251  334.544292   425.0

print("Linear Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lin_reg_y_pred)))
print("Linear Regression Model Training Score: ",lin_reg.score(X_train, y_train))
print("Linear Regression Model Testing Score: ",lin_reg.score(X_test, y_test))

Output:
Linear Regression Model RMSE is:  354.1849220196898
Linear Regression Model Training Score:  0.6011799546413507
Linear Regression Model Testing Score:  0.253242681589842

Perform ridge regression

# Cross Validation
cv = RepeatedKFold(n_splits=5, n_repeats=3, random_state=1)

# Initializing the ridge regressor

ridge = RidgeCV(alphas = np.arange(0.1,10,0.1),cv= cv ,scoring = 'neg_mean_absolute_error')

# Fitting the ridge regressor
ridge.fit(X_train,y_train)
ridge_reg_y_pred = ridge.predict(X_test)

print("Ridge tuning parameter:", (ridge.alpha_))
print ("Ridge model coefficients:", (ridge.coef_))
print ("Ridge model intercept:", (ridge.intercept_))

Output:
Ridge tuning parameter: 9.9
Ridge model coefficients: [-117.07769463  171.87437152  -37.49273895   -9.68483689   11.10827022
   80.22126236  -91.88294859   -8.92215114  104.84749774  101.42600928
  121.93640341   99.06795238  -91.56804842   35.52525912   39.07253945
  -36.09735751   20.00165719  -58.22366564   -0.27411841]
Ridge model intercept: 546.5721764705883

print("Ridge Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, ridge_reg_y_pred)))
print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))
print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))

Output:
Ridge Regression Model RMSE is:  331.2307351224517
Ridge Regression Model Training Score:  0.5570347617071545
Ridge Regression Model Testing Score:  0.3468986122864255

Perform lasso regression

lasso = LassoCV(alphas=np.arange(0.1,10,0.1),cv=cv ,tol=1)

# Fitting the lasso regressor
lasso.fit(X_train,y_train)
lasso_reg_y_pred = lasso.predict(X_test)

print("Lasso tuning parameter:", (lasso.alpha_))
print ("Lasso model coefficients:", (lasso.coef_))
print ("Lassso model intercept:", (lasso.intercept_))

Output:
Lasso tuning parameter: 0.8
Lasso model coefficients: [118.60882122  61.29348915 -12.90165159  -8.08876617  20.98119598
  60.93840301 123.38114873  70.82458489  23.11661398  44.11595949
 -23.30516437   4.19798778 -29.09075     27.06294716  -0.31059369
 -47.38188738  21.78416346 -68.05616091   5.88532458]
Lassso model intercept: 546.5721764705883

print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Output:
Lasso Regression Model RMSE is:  327.7718980528511
Lasso Regression Model Training Score:  0.4243398686558457
Lasso Regression Model Testing Score:  0.3604672611701014

# Fitting the lasso regressor
lasso.fit(X_train,y_train)
lasso_reg_y_pred = lasso.predict(X_test)

print("Lasso tuning parameter:", (lasso.alpha_))
print ("Lasso model coefficients:", (lasso.coef_))
print ("Lassso model intercept:", (lasso.intercept_))

Output:
Lasso tuning parameter: 0.8
Lasso model coefficients: [118.60882122  61.29348915 -12.90165159  -8.08876617  20.98119598
  60.93840301 123.38114873  70.82458489  23.11661398  44.11595949
 -23.30516437   4.19798778 -29.09075     27.06294716  -0.31059369
 -47.38188738  21.78416346 -68.05616091   5.88532458]
Lassso model intercept: 546.5721764705883

print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Output:
Lasso Regression Model RMSE is:  327.7718980528511
Lasso Regression Model Training Score:  0.4243398686558457
Lasso Regression Model Testing Score:  0.3604672611701014

Compare the score

print("Linear Regression Model Training Score: ",lin_reg.score(X_train, y_train))
print("Linear Regression Model Testing Score: ",lin_reg.score(X_test, y_test))
print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))
print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))
print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Output:
Linear Regression Model Training Score:  0.6011799546413507
Linear Regression Model Testing Score:  0.253242681589842
Ridge Regression Model Training Score:  0.5570347617071545
Ridge Regression Model Testing Score:  0.3468986122864255
Lasso Regression Model Training Score:  0.4243398686558457
Lasso Regression Model Testing Score:  0.3604672611701014

print("Linear Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lin_reg_y_pred)))
print("Ridge Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, ridge_reg_y_pred)))
print("Lasso Regression Model RMSE is: ", math.sqrt(mean_squared_error(y_test, lasso_reg_y_pred)))

Output:
Linear Regression Model RMSE is:  354.1849220196898
Ridge Regression Model RMSE is:  331.2307351224517
Lasso Regression Model RMSE is:  327.7718980528511

Ridge Lasso Regression

Ridge Lasso Regression

Python Implementation for Ridge & Lasso regression:

Data Preprocessing

Compare the score

Social Profile

Data Driven Fashion

Ridge Lasso Regression

Python Implementation for Ridge & Lasso regression:

Data Preprocessing

Compare the score

Register

Login here

Forgot your password?

Subscribe to our email list

Social Profile

Data Driven Fashion