Linear Regression:

What:

  • It is a supervised machine-learning algorithm
  • Use to predict the value of a variable based on the value of another variable.
  • Use for Predictive Analysis.
  • Use to determine the linear relationship between dependent variable (y) and independent variables (x)
  • This linear relationship is represented by a straight line called regression line/best-fit line
  • This is the pattern on which the machine has learned from the data
  • Used for predicting the output of quantitative type (continuous value) eg. Age, salary, price etc.
  • Regression line range is – ∞ to + ∞

Types:

  1. Simple Linear Regression:
  • Formula y = mx + c

    Where,

    • y is the response or target variable
    • x is the predictor variable/input
    • m is the slope/coefficient of x
    • c is the intercept /constant
  1. Multiple Linear Regression:
  • Formula y = m1x1 + m2x2 + …+ mnxn + c
    Where,
    • y is the response or target variable
    • x1, x2, x3, …xn represents the features
    • m1, m2, m3, …mn represents the coefficient of x1, x2, x3, …xn respectively
    • c is the intercept /constant

Loss/Cost Function:

  • It is the function that signifies how much the predicted values are deviated from the actual values.
  • MSE is the most commonly used cost function for linear regression .
  • MSE is the sum of the squared difference between the predicted and actual value.
  • Output of MSE is the single number representing the cost.

Replace Yi pred with mxi+c

Gradient Descent:

  • It is an optimization algorithm used to find the optimal value of parameters that minimizing the cost function.

  • If we update variables or parameters of some cost function in the direction of the negative gradient in an iterative manner to reach the minimum of some cost function is called gradient descent algorithm.

  • It helps to get optimal value for the slope m which provides the best fit line.

  • Our aim it to minimize the error between the predicted values and the actual values.

  • The gradient descent curve has the cost function and slope values.

  • This algorithm starts with a randomly selected m value and from there it uses calculus to iteratively adjust the values of m and calculate cost function for all the slopes.

  • So , it takes all the error values and searches for the minimum error, it creates a best fit line using that m.

  • For the randomly selected m, it might not result in global minimum. So, we need to move down and for that we use convergence theorem

    • N.B:The convergence theorem is a mathematical concept that describes the behavior of a sequence or a series of values as it approaches a specific limit. Some examples of convergence theorems include the Monotone Convergence Theorem, the Cauchy Convergence Theorem, and the Bolzano-Weierstrass Theorem. Although there may not be a specific convergence theorem for gradient descent in linear regression,
  • Learning rate should be a small value ranging between 0.1 to 0.0000001. Learning rate gives the rate of speed where the gradient moves during gradient descent. Setting it too high would make your path instable, too low would make convergence slow. Put it to zero means your model isn’t learning anything from the gradients.
  • To find the derivatives of slope, we need to draw a tangent from that point.

  • If the right-hand side of the tangent is facing towards down then its a negative slope. So, the derivative of that slope will also be negative . Hence we need to increase the m value to move towards the global minima.

  • If the right-hand side of the tangent is facing towards left then its a positive slope. So, the derivative of that slope will also be positive . Hence we need to reduce the m value to move towards the global minima.

                   Fig: Gradient descent algorithm

Assumption of Linear Regression Model:

  • Linearity: Linear relationship exists between dependent and independent variable .

    In case of non-linearity use transformation such as logarithmic, exponential , square root etc

  • No Multicollinearity: If there is multicollinearity its unclear which independent variable explains the dependent variable

  • Errors are normally distributed . If not then, confidence interval may become too wide or narrow

Pros:

  • Simple method
  • Easy to use and understand

Cons:

  • Very sensitive to outliers
  • Performs well for linearly separable datasets only

Python Implementation for Linear Regression:

Business Case: To predict total sales by using features like money spent on marketing on individual items.

Received from Domain expert

  1. TV :- Amount spend on TV Advertisement.
  2. Radio :-Amount spend on Radio Advertisement.
  3. Newspaper :-Amount spend on Newspaper advertisement.
  4. Sales :-Sales of Product.
# importing basic libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

# Load dataset

sales_data = pd.read_csv('/content/drive/MyDrive/Data Science/CDS-07-Machine Learning & Deep Learning/06. Machine Learning Model /01_Linear Regression/Linear Regression Class/Data Set/Advertising.csv')

Basic Checks & Domain Analysis:

sales_data.head()
Output:
   Unnamed: 0     TV  Radio  Newspaper  Sales
0           1  230.1   37.8       69.2   22.1
1           2   44.5   39.3       45.1   10.4
2           3   17.2   45.9       69.3    9.3
3           4  151.5   41.3       58.5   18.5
4           5  180.8   10.8       58.4   12.9
sales_data.tail()
Output:
    Unnamed: 0     TV  Radio  Newspaper  Sales
195         196   38.2    3.7       13.8    7.6
196         197   94.2    4.9        8.1    9.7
197         198  177.0    9.3        6.4   12.8
198         199  283.6   42.0       66.2   25.5
199         200  232.1    8.6        8.7   13.4
sales_data.describe()
Output:
       Unnamed: 0          TV       Radio   Newspaper       Sales
count  200.000000  200.000000  200.000000  200.000000  200.000000
mean   100.500000  147.042500   23.264000   30.554000   14.022500
std     57.879185   85.854236   14.846809   21.778621    5.217457
min      1.000000    0.700000    0.000000    0.300000    1.600000
25%     50.750000   74.375000    9.975000   12.750000   10.375000
50%    100.500000  149.750000   22.900000   25.750000   12.900000
75%    150.250000  218.825000   36.525000   45.100000   17.400000
max    200.000000  296.400000   49.600000  114.000000   27.000000
sales_data.info()
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  200 non-null    int64  
 1   TV          200 non-null    float64
 2   Radio       200 non-null    float64
 3   Newspaper   200 non-null    float64
 4   Sales       200 non-null    float64
dtypes: float64(4), int64(1)
memory usage: 7.9 KB
sales_data.shape
Output:
(200, 5)

Observation from basic checks

  1. Column Unnamed is id column not necessary in modeling, so we will drop this column
  2. No of rows are 200 & columns are 5
  3. 3 independent variables ( TV,Radio, Newspaper)
  4. 1 dependent or target variable Sales which depends on other 3 independent variable
  5. All four variable values are not in same scale, so scaling required
  6. No missing value present in data set
  7. Expense on TV advertisement is high comparing to radio & newspaper
  8. No categorical value present in the data set

Exploratory Data Analysis:

Step 01:Univariate Analysis

# Analyzing TV
sns.histplot(x=sales_data["TV"],kde=True)

# Analyzing Radio
sns.histplot(x=sales_data["Radio"],kde=True)

# Analyzing Newspaper
sns.histplot(x=sales_data["Newspaper"],kde=True)

Observation from Univariate Analysis

  1. No Pattern in TV & Radio data
  2. Newspaper data is RIght Skewed

Step 02:Bivariate Analysis

# Analyzing TV And Sales

sns.relplot(x='TV',y='Sales',data=sales_data)

# Analyzing Radio And Sales

sns.relplot(x='Radio',y='Sales',data=sales_data)

# Analyzing Newspaper And Sales

sns.relplot(x='Newspaper',y='Sales',data=sales_data)

Observation from Bivariate Analysis

  1. The marketing on TV is leading to more sales in the product
  2. No specific trend is showing for Radio advertising on sales
  3. No specific trend is showing for Newspaper advertising on sales

Step 03:Multivariate Analysis

# Analysis all 3 independent variables with sales variable

sns.pairplot(sales_data.drop('Unnamed: 0',axis=1))

Data Preprocessing and Feature Engineering:

Step 01: Imputing Missing values

  • As no missing value is data set , we are skipping this step

Step 02: Converting categorical data to numerical data

  • As no categorcal data , we are skipping this step

Step 03: Checking & handling Outliers

# Checking for TV data 

sns.boxplot(x='TV',data=sales_data)

# Checking for Radio data 

sns.boxplot(x='Radio',data=sales_data)

# Checking for Newspaper data 

sns.boxplot(x='Newspaper',data=sales_data)

Observation from Checking Outlier

  • only newspaper data has outliers

  • We are not removing this outlier for now

Step 04: Scaling down the continuous variable

  • Although our all variable are not in same scale , we are not performing scaling technique now to keep this blog easier 

Step 05: Transformation

  • As our data is not normal distribution , we should transform to normal distribution. But we are not performing now to keep this blog easier

Feature Selection:

Step 01: Dropping the unwanted variables

sales_data.drop('Unnamed: 0',axis=1,inplace=True)
sales_data
Output:
        TV  Radio  Newspaper  Sales
0    230.1   37.8       69.2   22.1
1     44.5   39.3       45.1   10.4
2     17.2   45.9       69.3    9.3
3    151.5   41.3       58.5   18.5
4    180.8   10.8       58.4   12.9
..     ...    ...        ...    ...
195   38.2    3.7       13.8    7.6
196   94.2    4.9        8.1    9.7
197  177.0    9.3        6.4   12.8
198  283.6   42.0       66.2   25.5
199  232.1    8.6        8.7   13.4

[200 rows x 4 columns]

Step 02: Checking the Correlation

  • We will use heatmap here
sns.heatmap(sales_data.drop("Sales",axis=1).corr(),annot=True) # dropping sales data as it output

Observation from correlation

  • No feature is highly correlated with other feature

  • So we will use all 3 features as input

Model Creation:

Step 01: Creating independent & dependent variable

  • Commonly independent variable represent by X

  • Commonly dependent variable is represent by y

X = sales_data.iloc[:,0:3]
y = sales_data.iloc[:,3]

Step 02: Creating Training & Testing data

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2 ,random_state=45)

Step 03: Creating Model

from sklearn.linear_model import LinearRegression

model = LinearRegression() # object creation

model.fit(X_train,y_train) # training linear regression

y_predict= model.predict(X_test)
# See predicted y values & actual y value

print(y_predict)
print('----------------------------------')
print(np.array(y_test))
Output:
[15.18887309 10.2054111  16.43931961 21.80818887 15.88752137  8.92680199
 18.13567301 11.36589433 17.39755473  8.66950442 11.4822015   9.719351
 12.1396776  19.13491661 16.94206504  6.52793621 14.05605199  7.77833624
 21.09549852 12.35393889 19.24140535  7.51159355 17.35753103 10.14557775
 17.14293028  7.03827428 20.44646647 12.24372302 15.01515604 14.31985601
 23.18859932 20.39708782 19.89616957 16.52262551  9.97604212 10.09042996
 16.8580678  18.25948647 13.17938188 19.53806065]
----------------------------------
[14.9  8.8 16.6 23.8 12.   9.7 19.  11.8 18.5  8.5 10.8 10.1 11.7 17.4
 15.7  8.7 14.1  9.7 22.3 10.8 19.6  7.6 12.8 10.1 17.3  8.6 20.7 11.7
 15.  14.5 25.4 22.1 19.8 17.3 11.6 11.3 18.  15.  12.9 18.9]
y_predict.shape
Output:
(40,)

Model Evaluation

from sklearn.metrics import r2_score,mean_squared_error,mean_absolute_error
#r2 Score

r2 = r2_score(y_test,y_predict)
r2 
Output:
0.8955882331233612
X_test.shape
Output:
(40, 3)
#Adjusted r2 score

adjusted_r2 = 1-(1-r2)*(40-1)/(40-3-1)
adjusted_r2
Output:
0.886887252550308

Adjusted R2 = 1 – [(1-R2)*(n-1)/(n-k-1)]

where:

  • R2: The R2 of the model
  • n: The number of observations
  • k: The number of predictor variables
# mean Square Error(MSE)

MSE = mean_squared_error(y_test,y_predict)
MSE
Output:
2.256494247280935
# root mean square error(RMSE)

import math

RMSE = math.sqrt(MSE)
RMSE
Output:
1.5021631892976657
# mean absolute error

MAE = mean_absolute_error(y_test,y_predict)
MAE
Output:
1.0788802763848646

Register

Login here

Forgot your password?

ads

ads

I am an enthusiastic advocate for the transformative power of data in the fashion realm. Armed with a strong background in data science, I am committed to revolutionizing the industry by unlocking valuable insights, optimizing processes, and fostering a data-centric culture that propels fashion businesses into a successful and forward-thinking future. - Masud Rana, Certified Data Scientist, IABAC

© Data4Fashion 2023-2024

Developed by: Behostweb.com

Please accept cookies
Accept All Cookies