Step 1:-  Business Case and Problem Statement :

Identify the business case and categorize the type of problem to solve. i.e Regression, Classification,Time Series Analysis.

Step 2:-  Importing Necessary Libraries :

Import libraries like numpy, pandas, matplotlib, seaborn, etc. You can import other relevant libraries later within the project upon necessity.

Step 3:-  Data Collection :

Load the dataset through pandas or SQL query.

Step 4:-  Exploratory Data Analysis :

Get the insights from data and find out which variables are impacting the target variable with the help of domain knowledge.

Step 5:-  Data Preprocessing & Cleaning:

  • Remove unwanted columns like ids, and columns with most of the values missing.
  • Impute missing values
  • Check for outliers
  • Convert categorical to numerical
  • Feature scaling
  • Handle imbalance dataset
  • Format the data
  • Clean the data

Step 6:-  Feature Selection & Engineering :

Feature selection is the process of selecting a subset of relevant features (variables, attributes) from the original set of features in a dataset.

The goal of feature selection is to choose the most informative and important features while ignoring irrelevant or redundant ones.

Feature engineering is the process of creating new features or transforming existing features in a dataset to improve the performance of machine learning models.

Step 7:-  Model Selection and Building :

There are 2 approaches for model selection,

  • Test all possible algorithms on your data to see which works best for you. This approach is computationally costly when you have huge datasets.
  • Another approach is to try and understand what the algorithm does before deciding if it is a good fit for your problem or not. The more you understand how the algorithm works and its limitations, the better your chances are of identifying whether it is a good choice for your problem or not.

Once you have narrowed down your algorithms, the next step is to build the model and train it on your data.

Step 8:-  Hyperparameter Tuning :

Hyperparameter tuning is a process to set values to parameters of models, that the model cannot set by itself. So manually we need to tune them. Hyperparameter optimizes the performance of the model.

There are 2 approaches that are widely used to tune hyperparameters, based on the type of problem you can go for the below methods

  • GridSearchCV
  • RandomizedSerchCV

Step 9:-  Model Evaluation :

Here are some possible scores for binary class classification problems.

  • Confusion matrix
  • Accuracy
  • Precision
  • Recall
  • Specificity
  • F1 score
  • Precision-Recall or PR curve
  • ROC (Receiver Operating Characteristics) curve
  • PR vs ROC curve.

For regression problems, we can look at

  • Log Loss
  • RMSE
  • R Squared/Adjusted R squared


Login here

Forgot your password?



I am an enthusiastic advocate for the transformative power of data in the fashion realm. Armed with a strong background in data science, I am committed to revolutionizing the industry by unlocking valuable insights, optimizing processes, and fostering a data-centric culture that propels fashion businesses into a successful and forward-thinking future. - Masud Rana, Certified Data Scientist, IABAC

© Data4Fashion 2023-2024

Developed by:

Please accept cookies
Accept All Cookies