Introduction to Machine Learning

What is Machine Learning?

Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed.
It is a branch of artificial intelligence (AI) and computer science that focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
Some innovative products are based on machine learning, such as Netflix’s recommendation engine and self-driving cars.
Machine learning is an important component of the growing field of data science.
Through the use of statistical methods, algorithms are trained to make classifications or predictions to uncover key insights in data mining projects.

Diff between traditional programming & Machine learning
- Programming >> data+rule=output
- Machine learning >> data+output=rule

Types of Machine Learning:

Supervised ML :

Supervised learning is a type of machine learning where the model is trained on labeled data, meaning each input has a corresponding correct output. The goal is to learn a mapping from inputs to outputs.

Types of Supervised Learning:

Classification: Predicts a category or class label.

Example: Email spam detection (Spam or Not Spam).

Regression: Predicts a continuous numerical value.

Example: Predicting house prices based on features like size and location.

Methods of Supervised Learning:

Classification Algorithms
1. Logistic Regression – Used for binary classification.
2. Decision Trees – Splits data based on feature values.
3. Random Forest – A collection of multiple decision trees.
4. Support Vector Machines (SVM) – Finds the best decision boundary.
5. Neural Networks (Deep Learning) – Used for complex classification tasks like image recognition.
Regression Algorithms

1. Linear Regression – Models relationships using a straight line.
2. Polynomial Regression – Models nonlinear relationships.
3. Ridge & Lasso Regression – Used for regularization to prevent overfitting.
4. Support Vector Regression (SVR) – Adapts SVM for continuous outputs.

Unsupervised ML :

Unsupervised learning is a type of machine learning where the model is trained on unlabeled data, meaning there are no predefined outputs. The goal is to identify patterns, structures, or relationships within the data.

Types of Unsupervised Learning:

Clustering: Grouping similar data points together based on their features.

Example: Customer segmentation in marketing.

Dimensionality Reduction: Reducing the number of features while retaining essential information.

Example: Principal Component Analysis (PCA) for image compression.

Association Rule Learning: Finding relationships between variables in large datasets.

Example: Market Basket Analysis (If a customer buys bread, they are likely to buy butter).

Methods of Unsupervised Learning:

Clustering Methods
- K-Means Clustering – Groups data into K clusters based on similarity.
- Hierarchical Clustering – Creates a tree-like structure of clusters.
- DBSCAN (Density-Based Spatial Clustering) – Groups dense regions while ignoring noise.
Dimensionality Reduction Methods
- PCA (Principal Component Analysis) – Reduces features while preserving variance.
- t-SNE (t-Distributed Stochastic Neighbor Embedding) – Visualizes high-dimensional data in 2D or 3D.
- Autoencoders – Neural networks used for feature compression and learning.
Association Rule Learning Methods
- Apriori Algorithm – Finds frequent item sets in large datasets.
- FP-Growth Algorithm – A faster alternative to Apriori for rule mining.

Semi-Supervised Machine Learning:

Semi-supervised learning is a hybrid approach that combines both labeled and unlabeled data for training. Usually, a small portion of the dataset is labeled, while the majority remains unlabeled. This technique is useful when labeling data is expensive or time-consuming.

Types of Semi-Supervised Learning:

Self-Training: The model is first trained on a small set of labeled data. Then, it predicts labels for the unlabeled data and re-trains itself iteratively.

Example: Medical imaging, fraud detection

Graph-Based Models: Data points are treated as nodes in a graph, and relationships between them help in learning.

Example: Social network analysis.

Generative Models: Uses probabilistic models to generate missing labels.

Example: Variational Autoencoders (VAEs).

Consistency Regularization: Encourages the model to produce similar outputs for similar inputs, even when small noise is added.

Example: Used in modern deep learning techniques.

Methods of Semi-Supervised Learning:

Transductive Learning: Learns from labeled data and generalizes only to the given unlabeled data.

Example: Speech-to-text models trained on a limited dataset.

Inductive Learning: Learns a general rule from labeled data and applies it to unseen data.

Example: Email spam classification where some emails are labeled, but new emails keep coming in.

Reinforcement ML :

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent takes actions, receives rewards or penalties, and learns a policy to maximize long-term rewards. RL is widely used in robotics, gaming, finance, self-driving cars, and healthcare.

Types of Reinforcement Learning:

Model-Based RL: The agent builds a model of the environment and predicts future outcomes before taking action. Used in robotics, AlphaGo, and decision-making systems.

Example: Dynamic Programming (DP)

Model-Free RL: The agent learns directly from experience without an explicit model. Used in game AI, robotics, and real-world applications.

Example: Q-Learning, SARSA

Methods of Reinforcement Learning:

Value-Based Methods: The agent maintains a value function that estimates the expected long-term reward for each state. The goal is to find the best action based on the estimated value.

Examples: Q-Learning (Off-policy), SARSA (On-policy), Deep Q Networks (DQN)

Policy-Based Methods: The agent directly learns a policy without using a value function. Used for continuous action spaces, like robotics and finance.

Examples: REINFORCE Algorithm (Monte Carlo Policy Gradient), Actor-Critic Methods (Combines policy and value-based methods)

Model-Based Methods: The agent builds a model of the environment and uses it for planning.Used in robotics, strategic games, and decision-making systems.

Examples: Monte Carlo Tree Search (MCTS), AlphaGo’s Policy and Value Networks

AI, ML, DL, DS difference :

Artificial Intelligence (AI) enables machines to think by understanding, learning from the data, and taking decisions based on patterns hidden in the data or make inferences that would otherwise be very difficult for humans to make manually. The end goal of using ML or DL is to create an AI application or machine as smart as humans.
- Areas of Artificial Intelligence:
  1. Computer Vision
  2. Natural Language Processing
  3. Machine Learning & Deep Learning
  4. Decision Making
  5. Robotics
Machine Learning (ML) is a subset of AI; it provides us with statistical tools/techniques like Supervised, Unsupervised, and Reinforcement learning to explore and analyze the data.
Deep Learning (DL) is further a subset of ML, and the main idea behind it is to make machines learn by mimicking the human brain. Here, we create a multi-neural network architecture with the help of different techniques like ANN, CNN, and RNN.
Data Science (DS) is basically drawing insights from structured and unstructured data either by using ML or DL or without these techniques. We can even use different visualization tools, statistics, and probability to gain these insights.

General Life Cycle/Pipe Line For Machine Learning Projects:

Step 1:- Identify the business case and categorize the type of problem to solve. i.e. Regression, Classification, and Time Series Analysis.

Step 2:- Data Collection

Data Collection is the first and foundational step in the machine learning process. It involves gathering relevant data that will be used to train and evaluate machine learning models. The quality and quantity of data collected directly impact the model’s performance and accuracy.

Step 3:- Identify the independent and dependent variables.

Step 4:- Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a critical step in the data science process, where you analyze and visualize datasets to uncover patterns, relationships, and insights. EDA helps you understand the structure, distribution, and quality of the data, which informs the next steps in the modeling process.

Step 5:- Data Preprocessing

Remove unwanted columns like ids, and columns with most of the values missing.
Impute missing values
Check for outliers
Convert categorical to numerical
Feature scaling
Handle imbalance dataset
Format the data
Clean the data

Step 6:- Feature Engineering

Feature Engineering is the process of selecting, modifying, or creating new features (variables) from raw data to improve the performance of a machine learning model.

- Feature Selection:
  - Filter Methods: Use statistical techniques (e.g., correlation, chi-square,HeatMap etc) to select features that have the strongest relationships with the target variable.
  - Wrapper Methods: Use iterative processes (e.g., forward selection, backward elimination) to find the best subset of features based on model performance.
  - Embedded Methods: Feature selection is integrated into the model training process (e.g., Lasso regression).
- Feature Transformation:
  - Normalization/Scaling: Adjust the range of numerical features to a common scale (e.g., min-max scaling, standardization) to ensure that all features contribute equally to the model.
  - Log Transformation: Apply a logarithmic function to skewed data to make it more normally distributed.
  - Binning: Convert continuous features into discrete bins or categories (e.g., age groups).
- Encoding Categorical Variables:
  - One-Hot Encoding: Convert categorical variables into a series of binary columns (e.g., “red,” “blue,” “green” becomes separate binary columns).
  - Label Encoding: Assign numerical values to categorical variables (e.g., “red” = 1, “blue” = 2).
- Feature Creation:
  - Polynomial Features: Generate new features by combining existing ones (e.g., creating interaction terms or higher-order polynomials).
  - Date/Time Features: Extract features from date and time (e.g., day of the week, month, hour) to capture seasonal patterns.
  - Domain-Specific Features: Use domain knowledge to create new features that capture important aspects of the data (e.g., creating a “body mass index” feature from height and weight).
- Handling Missing Data:
  - Imputation: Fill in missing values with a specific value (e.g., mean, median) or use more advanced techniques like K-nearest neighbors (KNN) imputation.
  - Dummy Variables: Create a binary feature indicating whether a value was missing

Step 7:- Model Selection and Building

Selecting a model for the problem you are solving is a crucial step. There are 2 ways to handle the model selection,

Test all possible algorithms on your data to see which works best for you. There are both pros and cons to this approach. The pros would be that you would definitely know that one algorithm or a set of algorithms are better choices for your problem statement. The approach is computationally costly when you have huge datasets.
Another approach is to try and understand what the algorithm does before deciding if it is a good fit for your problem or not. Do not be afraid to go into the basics of the algorithm itself.The more you understand how the algorithm works and its limitations, the better your chances are of identifying whether it is a good choice for your problem or not.

Once you have narrowed down your algorithms, the next step is to build the model and train it on your data.

Step 8:- Hyperparameter tuning

Hyperparameter tuning is a process to set values to parameters of models, that model cannot set by itself. So manually we need to tune them. Hyperparameter optimizes the performance of the model.

There are 2 approaches that are widely used to tune hyperparameters, based on the type of problem you can go for the below methods

GridSearchCV
RandomizedSerchCV

Step 9:-Model Evaluation

Choose a good evaluation metric pertinent to your problem. Many people go with accuracy in tasks such as classification, regression, etc merely because it is the easiest metric to understand but it might not be the case always. Here are some possible scores for binary class classification problems.

Confusion matrix
Accuracy
Precision/Sensitivity
Recall
Specificity
F1 score
Precision-Recall or PR curve
ROC (Receiver Operating Characteristics) curve
PR vs ROC curve.

For regression problems, we can look at

Log Loss
RMSE
R Squared/Adjusted R squared

Step 10:-Model Deployment

Model deployment is the process of taking a trained machine learning model and making it available in a production environment so it can be used to make predictions on new, unseen data. It’s the step where the model is integrated into applications, systems, or processes to provide real-world value.

Save the trained model in a format (like Pickle or ONNX) that can be loaded later.
Develop an API (e.g., RESTful API) that allows external systems to interact with the model.
Embed the model into the existing application or system where it will be used.

Tools to use:

Frameworks: TensorFlow Serving, TorchServe, ONNX Runtime.
APIs: Flask, FastAPI, Django (for creating web APIs).
Cloud Services: AWS SageMaker, Google AI Platform, Azure Machine Learning.
Containerization: Docker, Kubernetes (for scalable deployment).

Step 11:-Model Monitoring & Maintenance

Monitoring involves continuously tracking the performance of a deployed machine learning model. Key metrics include accuracy, latency, and prediction error rates.

Maintenance refers to the ongoing process of updating and improving the model after deployment. This can involve retraining the model with new data, fine-tuning hyperparameters, or even replacing the model if it becomes outdated.

Introduction to Machine Learning

What is Machine Learning?

Types of Machine Learning:

Unsupervised ML :

Semi-Supervised Machine Learning:

Types of Semi-Supervised Learning:

Methods of Semi-Supervised Learning:

Reinforcement ML :

Types of Reinforcement Learning:

Methods of Reinforcement Learning:

AI, ML, DL, DS difference :

General Life Cycle/Pipe Line For Machine Learning Projects:

Social Profile

Data Driven Fashion

What is Machine Learning?

Types of Machine Learning:

Unsupervised ML :

Semi-Supervised Machine Learning:

Types of Semi-Supervised Learning:

Methods of Semi-Supervised Learning:

Reinforcement ML :

Types of Reinforcement Learning:

Methods of Reinforcement Learning:

AI, ML, DL, DS difference :

General Life Cycle/Pipe Line For Machine Learning Projects:

Register

Login here

Forgot your password?

Subscribe to our email list

Social Profile

Data Driven Fashion