Linear Regression

Linear Regression:

What:

It is a supervised machine-learning algorithm
Use to predict the value of a variable based on the value of another variable.
Use for Predictive Analysis.
Use to determine the linear relationship between dependent variable (y) and independent variables (x)
This linear relationship is represented by a straight line called regression line/best-fit line
This is the pattern on which the machine has learned from the data
Used for predicting the output of quantitative type (continuous value) eg. Age, salary, price etc.
Regression line range is – ∞ to + ∞

Types:

Simple Linear Regression:

Formula y = mx + c
Where,
- y is the response or target variable
- x is the predictor variable/input
- m is the slope/coefficient of x
- c is the intercept /constant

Multiple Linear Regression:

Formula y = m1x1 + m2x2 + …+ mnxn + c
Where,
- y is the response or target variable
- x1, x2, x3, …xn represents the features
- m1, m2, m3, …mn represents the coefficient of x1, x2, x3, …xn respectively
- c is the intercept /constant

Loss/Cost Function:

It is the function that signifies how much the predicted values are deviated from the actual values.
MSE(Mean Squared Error) is the most commonly used cost function for linear regression.
MSE is the sum of the squared difference between the predicted and actual value.
Output of MSE is the single number representing the cost.

Replace Y_{i pred}with mx_i+c

Gradient Descent:

It is an optimization algorithm used to find the optimal value of parameters that minimizing the cost function.
If we update variables or parameters of some cost function in the direction of the negative gradient in an iterative manner to reach the minimum of some cost function is called gradient descent algorithm.
It helps to get optimal value for the slope m which provides the best fit line.
Our aim it to minimize the error between the predicted values and the actual values.
The gradient descent curve has the cost function and slope values.
This algorithm starts with a randomly selected m value and from there it uses calculus to iteratively adjust the values of m and calculate cost function for all the slopes.
So , it takes all the error values and searches for the minimum error, it creates a best fit line using that m.
For the randomly selected m, it might not result in global minimum. So, we need to move down and for that we use convergence theorem
- N.B:The convergence theorem is a mathematical concept that describes the behavior of a sequence or a series of values as it approaches a specific limit. Some examples of convergence theorems include the Monotone Convergence Theorem, the Cauchy Convergence Theorem, and the Bolzano-Weierstrass Theorem. Although there may not be a specific convergence theorem for gradient descent in linear regression,

Learning rate should be a small value ranging between 0.1 to 0.0000001. Learning rate gives the rate of speed where the gradient moves during gradient descent. Setting it too high would make your path instable, too low would make convergence slow. Put it to zero means your model isn’t learning anything from the gradients.

To find the derivatives of slope, we need to draw a tangent from that point.
If the slope is negative (downward slope from left to right), then the derivative is negative, meaning we increase 𝑚 to move toward the global minimum.
If the slope is positive (upward slope from left to right), then the derivative is positive, meaning we decrease 𝑚 to move toward the global minimum.

Fig: Gradient descent algorithm

Assumption of Linear Regression Model:

Linearity: Linear relationship exists between dependent and independent variable .
In case of non-linearity use transformation such as logarithmic, exponential , square root etc
No Multicollinearity: If there is multicollinearity its unclear which independent variable explains the dependent variable
Errors are normally distributed . If not then, confidence interval may become too wide or narrow

Pros:

Simple method
Easy to use and understand

Cons:

Very sensitive to outliers
Performs well for linearly separable datasets only

Python Implementation for Linear Regression:

Business Case: To predict total sales by using features like money spent on marketing on individual items.

Received from Domain expert

TV :- Amount spend on TV Advertisement.
Radio :-Amount spend on Radio Advertisement.
Newspaper :-Amount spend on Newspaper advertisement.
Sales :-Sales of Product.

# importing basic libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

# Load dataset

sales_data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /01_Linear Regression/Linear Regression Class/Data Set/Advertising.csv')

Basic Checks & Domain Analysis:

sales_data.head()

Output:
   Unnamed: 0     TV  Radio  Newspaper  Sales
0           1  230.1   37.8       69.2   22.1
1           2   44.5   39.3       45.1   10.4
2           3   17.2   45.9       69.3    9.3
3           4  151.5   41.3       58.5   18.5
4           5  180.8   10.8       58.4   12.9

sales_data.tail()

Output:
    Unnamed: 0     TV  Radio  Newspaper  Sales
195         196   38.2    3.7       13.8    7.6
196         197   94.2    4.9        8.1    9.7
197         198  177.0    9.3        6.4   12.8
198         199  283.6   42.0       66.2   25.5
199         200  232.1    8.6        8.7   13.4

sales_data.describe()

Output:
       Unnamed: 0          TV       Radio   Newspaper       Sales
count  200.000000  200.000000  200.000000  200.000000  200.000000
mean   100.500000  147.042500   23.264000   30.554000   14.022500
std     57.879185   85.854236   14.846809   21.778621    5.217457
min      1.000000    0.700000    0.000000    0.300000    1.600000
25%     50.750000   74.375000    9.975000   12.750000   10.375000
50%    100.500000  149.750000   22.900000   25.750000   12.900000
75%    150.250000  218.825000   36.525000   45.100000   17.400000
max    200.000000  296.400000   49.600000  114.000000   27.000000

sales_data.info()

Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  200 non-null    int64  
 1   TV          200 non-null    float64
 2   Radio       200 non-null    float64
 3   Newspaper   200 non-null    float64
 4   Sales       200 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB

sales_data.shape

Output:
(200, 5)

Observation from basic checks

Column Unnamed is id column not necessary in modeling, so we will drop this column
No of rows are 200 & columns are 5
3 independent variables ( TV,Radio, Newspaper)
1 dependent or target variable Sales which depends on other 3 independent variable
All four variable values are not in same scale, so scaling required
No missing value present in data set
Expense on TV advertisement is high comparing to radio & newspaper
No categorical value present in the data set

Exploratory Data Analysis:

Step 01:Univariate Analysis

# Analyzing TV
sns.histplot(x=sales_data["TV"],kde=True)

# Analyzing Radio
sns.histplot(x=sales_data["Radio"],kde=True)

# Analyzing Newspaper
sns.histplot(x=sales_data["Newspaper"],kde=True)

Observation from Univariate Analysis

No Pattern in TV & Radio data
Newspaper data is RIght Skewed

Step 02:Bivariate Analysis

# Analyzing TV And Sales

sns.relplot(x='TV',y='Sales',data=sales_data)

# Analyzing Radio And Sales

sns.relplot(x='Radio',y='Sales',data=sales_data)

# Analyzing Newspaper And Sales

sns.relplot(x='Newspaper',y='Sales',data=sales_data)

Observation from Bivariate Analysis

The marketing on TV is leading to more sales in the product
No specific trend is showing for Radio advertising on sales
No specific trend is showing for Newspaper advertising on sales

Step 03:Multivariate Analysis

# Analysis all 3 independent variables with sales variable

sns.pairplot(sales_data.drop('Unnamed: 0',axis=1))

Data Preprocessing and Feature Engineering:

Step 01: Imputing Missing values

As no missing value is data set , we are skipping this step

Step 02: Converting categorical data to numerical data

As no categorcal data , we are skipping this step

Step 03: Checking & handling Outliers

# Checking for TV data 

sns.boxplot(x='TV',data=sales_data)

# Checking for Radio data 

sns.boxplot(x='Radio',data=sales_data)

# Checking for Newspaper data 

sns.boxplot(x='Newspaper',data=sales_data)

Observation from Checking Outlier

only newspaper data has outliers
We are not removing this outlier for now

Step 04: Scaling down the continuous variable

Although our all variable are not in same scale , we are not performing scaling technique now to keep this blog easier

Step 05: Transformation

As our data is not normal distribution , we should transform to normal distribution. But we are not performing now to keep this blog easier

Feature Selection:

Step 01: Dropping the unwanted variables

sales_data.drop('Unnamed: 0',axis=1,inplace=True)
sales_data

Output:
        TV  Radio  Newspaper  Sales
0    230.1   37.8       69.2   22.1
1     44.5   39.3       45.1   10.4
2     17.2   45.9       69.3    9.3
3    151.5   41.3       58.5   18.5
4    180.8   10.8       58.4   12.9
..     ...    ...        ...    ...
195   38.2    3.7       13.8    7.6
196   94.2    4.9        8.1    9.7
197  177.0    9.3        6.4   12.8
198  283.6   42.0       66.2   25.5
199  232.1    8.6        8.7   13.4

[200 rows x 4 columns]

Step 02: Checking the Correlation

We will use heatmap here

sns.heatmap(sales_data.drop("Sales",axis=1).corr(),annot=True) # dropping sales data as it output

Observation from correlation

No feature is highly correlated with other feature
So we will use all 3 features as input

Model Creation:

Step 01: Creating independent & dependent variable

Commonly independent variable represent by X
Commonly dependent variable is represent by y

X = sales_data.iloc[:,0:3]
y = sales_data.iloc[:,3]

Step 02: Creating Training & Testing data

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2 ,random_state=45)

Rule of Thumb: Always split your dataset into train and test BEFORE any preprocessing that involves the entire dataset to avoid data leakage.

Step 03: Creating Model

from sklearn.linear_model import LinearRegression

model = LinearRegression() # object creation

model.fit(X_train,y_train) # training linear regression

y_predict= model.predict(X_test)

# See predicted y values & actual y value

print(y_predict)
print('----------------------------------')
print(np.array(y_test))

Output:
[15.18887309 10.2054111  16.43931961 21.80818887 15.88752137  8.92680199
 18.13567301 11.36589433 17.39755473  8.66950442 11.4822015   9.719351
 12.1396776  19.13491661 16.94206504  6.52793621 14.05605199  7.77833624
 21.09549852 12.35393889 19.24140535  7.51159355 17.35753103 10.14557775
 17.14293028  7.03827428 20.44646647 12.24372302 15.01515604 14.31985601
 23.18859932 20.39708782 19.89616957 16.52262551  9.97604212 10.09042996
 16.8580678  18.25948647 13.17938188 19.53806065]
----------------------------------
[14.9  8.8 16.6 23.8 12.   9.7 19.  11.8 18.5  8.5 10.8 10.1 11.7 17.4
 15.7  8.7 14.1  9.7 22.3 10.8 19.6  7.6 12.8 10.1 17.3  8.6 20.7 11.7
 15.  14.5 25.4 22.1 19.8 17.3 11.6 11.3 18.  15.  12.9 18.9]

y_predict.shape

Output:
(40,)

Model Evaluation

from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error

#r2 Score

r2 = r2_score(y_test,y_predict)
r2

Output:
0.8955882331233612

X_test.shape

Output:
(40, 3)

#Adjusted r2 score

adjusted_r2 = 1-(1-r2)*(40-1)/(40-3-1)
adjusted_r2

Output:
0.886887252550308

Adjusted R2 = 1 – [(1-R2)*(n-1)/(n-k-1)]

where:

R2: The R2 of the model
n: The number of observations
k: The number of predictor variables

# mean Square Error(MSE)

MSE = mean_squared_error(y_test,y_predict)
MSE

Output:
2.256494247280935

# root mean square error(RMSE)

import math

RMSE = math.sqrt(MSE)
RMSE

Output:
1.5021631892976657

# mean absolute error

MAE = mean_absolute_error(y_test,y_predict)
MAE

Output:
1.0788802763848646

Linear Regression:

Python Implementation for Linear Regression:

Exploratory Data Analysis:

Data Preprocessing and Feature Engineering:

Feature Selection:

Model Creation:

Model Evaluation

Social Profile

Data Driven Fashion

Linear Regression:

Python Implementation for Linear Regression:

Exploratory Data Analysis:

Data Preprocessing and Feature Engineering:

Feature Selection:

Model Creation:

Model Evaluation

Register

Login here

Forgot your password?

Subscribe to our email list

Social Profile

Data Driven Fashion