Distributions and Central Limit Theorem

Addition Rule of Probability:

When calculating the probability of either one of two events from occurring, it is as simple as adding the probability of each event and then subtracting the probability of both of the events occurring: P(A or B) = P(A) + P(B) – P(A and B)

If the 2 events are mutually exclusive, If they both cannot happen, P(A and B) is 0. Therefore: P(A or B) = P(A) + P(B)

Multiplication Rule of Probability:

If A and B are dependent events, then the probability of both events occurring simultaneously is given by: P(A ∩ B) = P(B) . P(A|B)

If A and B are two independent events in an experiment, then the probability of both events occurring simultaneously is given by: P(A ∩ B) = P(A) . P(B)

Bernoulli Distribution:

Also called the binary distribution

The Bernoulli distribution is a discrete probability distribution that models a random experiment with two possible outcomes: success and failure.

It is often used to represent binary events, where the probability of success (usually denoted as “p”) and the probability of failure (which is complementary to p and denoted as “q”) are constant for each trial.

The probability mass function (PMF) of the Bernoulli distribution is defined as follows:

P(X =x)= {p if x = 1 , q = 1 − p if x = 0 }

Where:

P(X=x) is the probability that the random variable X takes the value x, which can be either 1 (success) or 0 (failure).
p is the probability of success (e.g., the probability of an event occurring).
q is the probability of failure (complementary to p, i.e.,q=1−p).
In mathematical notation, you can represent the Bernoulli distribution as: X∼Bernoulli(p)
This means that the random variable X follows a Bernoulli distribution with probability of success p.
The expected value (mean) and variance of a random variable X following a Bernoulli distribution are as follows:
Expected Value (Mean): E(X)=p
Variance: Var(X)=p(1−p)
Standard Deviation: σ= √ p(1-p)
These values describe the central tendency and variability of the Bernoulli distribution.

Binomial Distributions:

Binomial distribution is a collection, n, of independent Bernoulli events.

An event being independent means that the results of the next event are not affected by the results of the previous event.

A distribution where only two outcomes are possible, such as success or failure, gain or loss, win or lose, and where the probability of success and failure is the same for all the trials is called a Binomial Distribution.

The Bernoulli distribution deals with a single trial with two possible outcomes, while the binomial distribution deals with a fixed number of independent and identical trials.

Poisson Distribution:

Poisson Distribution is applicable in situations where events occur at random points of time and space wherein our interest lies only in the number of occurrences of the event.

Example:

The number of emergency calls recorded at a hospital in a day.
The number of thefts reported in an area in a day.
The number of customers arriving at a salon in an hour.

Some notations used in Poisson distribution are:

λ is the rate at which an event occurs,
t is the length of a time interval,
X is the number of events in that time interval.

Here, X is called a Poisson Random Variable, and the probability distribution of X is called Poisson distribution.

Let µ denote the mean number of events in an interval of length t. Then, µ = λ*t.

The probability mass function (PMF) of the Poisson distribution is given by:

P(X = k) = (e^– ^λ . λ^k) / (k!) , k = 0,1,…

Where:

P(X = k) is the probability of observing k events.
e is the base of the natural logarithm ( approximately 2.71828).
λ is the average rate of events in the given interval.
k is the number of events you want to find the probability for.
k! is the factorial of k.

Example: Suppose you work at a call center, and on average, you receive 5 customer service calls per hour. You want to calculate the probability of receiving a specific number of calls in the next hour.

Probability of receiving exactly 3 calls in the next hour (k = 3 and λ = 5):

P(X = 3) = (e^-5.5³) / 3! ≈ 0.1404

So, there is a roughly 14.04% chance of receiving exactly 3 calls in the next hour

Normal Distribution:

Also known as Gaussian distribution

Example: Age, weight, height, Iris dataset

Importance:

Hypothesis test assumes data follows it
Linear & Non-Linear regression assumes residual follows it
Central limit theorem states as the sample size increase the distribution of the mean follows normal distribution irrespective of the distribution of the original variable
Most statistical software programs support of the probability functions for normal distribution

Parameter: Two main parameters as below which changes lead to change of shape of the distribution of shape

Mean: Determine the location of the peak & data points are clustered around the mean. Changing mean, the curve moves either to the left or right to the X-axis.

Standard Deviation: Determine how far data points are away from the mean & represent the distance between the mean and data points. Changing the value of SD tightens(steep curve) or expands(flatter curve) the width of the distribution along the X-axis.

Properties:

Symmetric: Equal number of observations lie on each side of the curve or mean value.
mean=median=mode: All three measures of central tendency fall in midpoint.
Empirical Rule: Almost all values lie within 3 standard deviations of the mean. 68%-95%-99.7% which is called the 3 sigma rule.
Skewness and kurtosis: Determine how different the distribution is from normal distribution. Skewness measures symmetry & kurtosis measures the thickness of tail distribution
Area under curve: 1
Standard Normal distribution: μ = 0 and σ = 1

Distribution functions:

Probability Density Function

Cumulative Density Function

Normality testing – Skewness & Kurtosis:

Skewness :

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean.

Basically it measures the level of how much a given distribution is different from a normal distribution (which is symmetric)

Skewness can be Positively Skewed or Negatively skewed.

Positively Skewed:

Distribution skewed to the right
Tail spread to the right
Mode < Median < Mean
Example: Wealth distribution, length of comment on YouTube

Negatively Skewed:

Distribution skewed to the left
Tail spread to the left
Mean < Median < Mode
Example: Life span of a human being(less number of people die at an early age)

Measure of Skewness :

Skewness = 0: Then normally distributed

Skewness > 0: Positively skewed

Skewness < 0: Negatively skewed

Formula:

Pearson’s Coefficient = (Mean-Mode)/Standard Deviation

Pearson’s Coefficient = 3(Mean-Median)/ Standard Deviation

If value is positive , skewness is positive

If value is negative , skewness is negative

Kurtosis :

In statistics, kurtosis is a measure of relative peakedness of a probability distribution, or alternatively how heavy or how light its tails are.

Positive excess kurtosis — when excess kurtosis, given by (kurtosis-3), is positive, then the distribution has a sharp peak and is called a leptokurtic

For Fisher’s definition, kurtosis > 0

For Pearson’s definition, kurtosis > 3

Negative excess kurtosis — when excess kurtosis, given by (kurtosis-3), is negative, then the distribution has a flat peak and is called a platykurtic

For Fisher’s definition, kurtosis < 0

For Pearson’s definition, kurtosis < 3

Zero excess kurtosis — when excess kurtosis, given by (kurtosis-3), is zero, then the distribution follows a normal distribution and is also called a mesokurtic

For Fisher’s definition, kurtosis = 0

For Pearson’s definition, kurtosis = 3

Python implementation for Skewness & kurtosis:

# Importing library

import numpy as np
import pandas as pd
import scipy.stats as stats
from scipy.stats import skew
from scipy.stats import kurtosis

# Creating data for skewness

Skewed_data=[88, 85, 82, 97, 67, 77, 74, 86, 81, 95, 77, 88, 85, 76, 81]

Skewed_data_df= pd.Series(Skewed_data)

print("Skewness importing skew : ",skew(Skewed_data,bias=False))
print("Skewness importing stats :",stats.skew(Skewed_data,bias=False))
print("Skewness using array :",Skewed_data_df.skew())

print("-------------------------------------------------------")

# Creating data Kurtosis

Kurtosis_data=[88, 85, 82, 97, 67, 77, 74, 86, 81, 95, 77, 88, 85, 76, 81]

Kurtosis_data_df= pd.Series(Kurtosis_data)


print("Kurtosis importing Kurtosis : ",kurtosis(Kurtosis_data,bias=False,fisher=True))
print("Kurtosis importing stats :",stats.kurtosis(Kurtosis_data,bias=False,fisher=False))
print("Kurtosis using array :",Kurtosis_data_df.kurtosis())


#Note:

#If fisher=True, Fisher's definition is used (where, normal = 0.0).

#If fisher=False,Pearson's definition is used (where, normal = 3.0).

#We use the argument bias=False to calculate the sample skewness and kurtosis as opposed to the population skewness and kurtosis.

Output:
Skewness importing skew :  0.0326966578855933
Skewness importing stats : 0.0326966578855933
Skewness using array : 0.0326966578855933
-------------------------------------------------------
Kurtosis importing Kurtosis :  0.11815715154945083
Kurtosis importing stats : 3.118157151549451
Kurtosis using array : 0.11815715154945172

Central Limit Theorem:

Definition: The central limit theorem states that the distribution of sample means approximates a normal distribution as the sample size gets larger (assuming that all samples are identical in size), regardless of population distribution shape.
Easy explanation: Whether my distribution is normal distribution or not normal distribution, if I take several samples where sample size n>=30 & calculate sample mean, then if I plot all sample means, it will give me normal distribution.
Properties:
- Sampling Distribution Mean(μₓ¯) = Population Mean(μ)
- Sampling distribution’s standard deviation (Standard error) = σ/√n ≈S/√n
- For n > 30, the sampling distribution becomes a normal distribution.

Python implementation to understand Central Limit Theorem:

# importing library


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

np.random.seed(42)  # to make analysis reproducible

# Loading dataset
data  = pd.read_csv('/content/sample_data/california_housing_train.csv')
data.rename(columns = {'housing_median_age':'age'}, inplace = True)

# Using age for our purposes to see population distribution

sns.displot(data.age,kde='hist')
# Not looks like normal distribution

# Mean & standard deviation for population

print('Population mean :',data.age.mean())
print('Population std :',data.age.std())

Output:
Population mean : 28.58935294117647
Population std : 12.586936981660399

# start sampling & make sample distribution

sam_size = 30
sample_means =pd.Series([data.age.sample(sam_size).mean() for i in range(1000)]) # pd.Series used to convert list Series to get mean , std
print('No of sample means :',len(sample_means))
print('Sample mean :',sample_means.mean())
print('Sample std :',sample_means.std())


#  Plotting the density for the sample means.

sns.distplot(sample_means,kde='hist',color="darkblue")
plt.xlabel('Sample Mean')
plt.ylabel('Density')
plt.title('Sampling Distribution of the Sample Mean (Central Limit Theorem)')

# Looks normal distribution

Output:
No of sample means : 1000
Sample mean : 28.521800000000002
Sample std : 2.270686312965189

Log Normal distribution:

In probability theory, a log normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed.
If x is log normal distribution , y = ln(x) is normal distribution. (ln is natural log)
To reverse x = exp(y).

Python implementation to see log normal distribution:

# Creating log normal distribution as we dont have real data

log_normal_dis_data=np.random.lognormal(4,1,10000)
plt.subplot(2,2,1)
sns.distplot(log_normal_dis_data)
plt.title("Log Normal distribution")

Normal_dis_data = np.log(log_normal_dis_data)
plt.subplot(2,2,2)
sns.distplot(Normal_dis_data)
plt.title("Normal distribution")

# To convert from normal to log normal

log_normal_dis_data=np.exp(Normal_dis_data)
plt.subplot(2,2,3)
sns.distplot(log_normal_dis_data)

Power Law Distribution:

Power law is a functional relationship between two quantities where a relative change in one quantity results in a relative change in the other quantity proportional to a power of the change.
One quantity varies as a power of another.
Example: 80% of wealth is distributed 20% of the total population

Python implementation to understand Power Law Distribution:

import numpy as np
import matplotlib.pyplot as plt

# Parameters for the Power Law distribution
alpha = 2.5  # shape parameter
xmin = 1.0   # minimum value

# Generate data points for x-axis
x = np.linspace(1, 10, 1000)  # Range of x values

# Calculate the probability density function (PDF) for each data point
pdf = (alpha-1) * xmin**(alpha-1) * (1/x)**alpha

# Plot the PDF
plt.plot(x, pdf, color='blue', label='Power Law PDF')

# Add labels and title
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Power Law Probability Density Function')
plt.legend()
plt.grid(True)

# Show the plot
plt.show()

Pareto Distribution:

It is named after Italian civil engineer Vilfredo Pareto
It is based on the power law probability distribution
It is also called the 80-20 rule
Example: a large portion of wealth is held by a small fraction of the population

Python implementation to understand Pareto Distribution:

import numpy as np
import matplotlib.pyplot as plt

# Parameters for the Pareto distribution
alpha = 2.5  # shape parameter
xm = 1.0     # scale parameter (minimum value)

# Generate data points for x-axis
x = np.linspace(0.1, 10, 1000)  # Range of x values

# Calculate the probability density function (PDF) for each data point
pdf = (alpha * xm**alpha) / (x**(alpha+1))

# Plot the PDF
plt.plot(x, pdf, color='red', label='Pareto PDF')

# Add labels and title
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.title('Pareto Probability Density Function')
plt.legend()
plt.grid(True)

# Show the plot
plt.show()

BOX COX Transformation:

The statisticians George Box and David Cox developed a procedure to identify an appropriate exponent (Lambda = l) to use to transform data into a “normal shape.” The Lambda value indicates the power to which all data should be raised.

Python implementation for box cox transformation from non-normal data to normal data:

# Creating non normal distribution as we dont have real data

Non_normal_dis_data=np.random.exponential(10,1000)
plt.subplot(2,2,1)
sns.distplot(Non_normal_dis_data)
plt.title("Non_normal_dis_data")

# Transform to normal Distribution using boxcox

Normal_dis_data,fitted_lambda = stats.boxcox(Non_normal_dis_data)
plt.subplot(2,2,2)
sns.distplot(Normal_dis_data)
plt.title("Normal distribution")

Normal_dis_data = np.log(Non_normal_dis_data)
plt.subplot(2,2,3)
sns.distplot(Normal_dis_data)
plt.title("Try for Normal distribution by log")

# rescaling the subplots
plt.tight_layout()

print(f"Lambda value used for Transformation: {fitted_lambda}")

Output:
Lambda value used for Transformation: 0.22930731384394407

Distributions and Central Limit Theorem

Addition Rule of Probability:

Multiplication Rule of Probability:

Bernoulli Distribution:

Binomial Distributions:

Poisson Distribution:

Normal Distribution:

Normality testing – Skewness & Kurtosis:

Central Limit Theorem:

Log Normal distribution:

Power Law Distribution:

Python implementation to understand Pareto Distribution:

Social Profile

Data Driven Fashion

Addition Rule of Probability:

Multiplication Rule of Probability:

Bernoulli Distribution:

Binomial Distributions:

Poisson Distribution:

Normal Distribution:

Normality testing – Skewness & Kurtosis:

Central Limit Theorem:

Log Normal distribution:

Power Law Distribution:

Python implementation to understand Pareto Distribution:

Register

Login here

Forgot your password?

Subscribe to our email list

Social Profile

Data Driven Fashion