Life Cycle/ Pipe Line/ Workflow for Machine Learning Projects

Step 1:- Business Case and Problem Statement :

Identify the business case and categorize the type of problem to solve. i.e Regression, Classification,Time Series Analysis.

Step 2:- Importing Necessary Libraries :

Import libraries like numpy, pandas, matplotlib, seaborn, etc. You can import other relevant libraries later within the project upon necessity.

Step 3:- Data Collection :

Load the dataset through pandas or SQL query.

Step 4:- Exploratory Data Analysis :

Get the insights from data and find out which variables are impacting the target variable with the help of domain knowledge.

Step 5:- Data Preprocessing & Cleaning:

Remove unwanted columns like ids, and columns with most of the values missing.
Impute missing values
Check for outliers
Convert categorical to numerical
Feature scaling
Handle imbalance dataset
Format the data
Clean the data

Step 6:- Feature Selection & Engineering :

Feature selection is the process of selecting a subset of relevant features (variables, attributes) from the original set of features in a dataset.

The goal of feature selection is to choose the most informative and important features while ignoring irrelevant or redundant ones.

Feature engineering is the process of creating new features or transforming existing features in a dataset to improve the performance of machine learning models.

Step 7:- Model Selection and Building :

There are 2 approaches for model selection,

Test all possible algorithms on your data to see which works best for you. This approach is computationally costly when you have huge datasets.
Another approach is to try and understand what the algorithm does before deciding if it is a good fit for your problem or not. The more you understand how the algorithm works and its limitations, the better your chances are of identifying whether it is a good choice for your problem or not.

Once you have narrowed down your algorithms, the next step is to build the model and train it on your data.

Step 8:- Hyperparameter Tuning :

Hyperparameter tuning is a process to set values to parameters of models, that the model cannot set by itself. So manually we need to tune them. Hyperparameter optimizes the performance of the model.

There are 2 approaches that are widely used to tune hyperparameters, based on the type of problem you can go for the below methods

GridSearchCV
RandomizedSerchCV

Step 9:- Model Evaluation :

Here are some possible scores for binary class classification problems.

Confusion matrix
Accuracy
Precision
Recall
Specificity
F1 score
Precision-Recall or PR curve
ROC (Receiver Operating Characteristics) curve
PR vs ROC curve.

For regression problems, we can look at

Log Loss
RMSE
R Squared/Adjusted R squared

Register

Login here

Forgot your password?

Subscribe to our email list

Data4Fashion