Definition of Statistics:
Types of Statistics :
01. Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way.
02. Inferential Statistics: Methods used to reach a conclusion about the population on the basis of a sample.
Applications of Statistical Concepts:
- Finance – Correlation, regression, time series analysis
- Marketing – Hypothesis testing, chi-square test, non-parametric testing
- Academic Research – Hypothesis testing, chi-square test, non-parametric testing
- Operating management – Hypothesis testing, ANOVA, time series analysis
- Retailing – Sales data, distribution analysis, instore promotion, new product development
Basic Terms of Statistics:
Cochran’s Formula :
The Cochran’s formula allows to calculate an ideal sample size given a desired level of precision, desired confidence level, and the estimated proportion of the attribute present in the population.
The Cochran formula is: n0 = Z2pq / e2
Where:
- e is the desired level of precision (i.e. the margin of error),
- p is the (estimated) proportion of the population that has the attribute in question,
- q is 1 – p.
- z-value is found in a Z table.
Example:
Suppose we are doing a study on the inhabitants of a large town, and want to find out how many households serve breakfast in the mornings.
We don’t have much information on the subject to begin with, so we’re going to assume that half of the families serve breakfast: this gives us maximum variability.
So p = 0.5 Now let’s say we want 95% confidence, and at least 5 percent—plus or minus—precision. A 95 % confidence level gives us Z values of 1.96, per the normal tables, so we get
=((1.96)2 (0.5) (0.5)) / (0.05)2 = 385
So a random sample of 385 households in our target population should be enough to give us the confidence levels we need.
Types of variable:
- Qualitative or Attribute or Categorical variable – Non-numeric characteristic.
Example: Gender, eye color, hair color, Country name, Types of flowers, etc…
- Quantitative variable – Numeric characteristic.
Example: height, weight, No of Children, etc….
Quantitative variables are classified as below:
- Discrete variable: contains values which are whole numbers.
Example: No of Children – 2,3,5 etc,
- Continuous variable: Contain values which are whole numbers & decimal numbers also.
Example: Height – 175.25 cm,180.5 cm etc…
Data Types in Statistics:
While doing Exploratory Data Analysis in data science project, we should have a good understanding of different data types since certain statistical measurements are only for specific data types. It is also known as a measurement scale.
Also, we need to know which visualization method fits the particular data type.
Data types are divided into below two categories:
01. Quantitative Data:
- Expressed as a number & measured by numerical variables.
- Represented by line graphs, bar graphs, scatter plots, etc.
- Examples: Exam score-74,76,98 etc Weight – 85.2 kg, 56 kg, etc
Quantitative data are 2 types as below:
- Discrete data: Only whole or integer numbers. Can not divided into smaller parts.
Example: No of students 25
- Continuous data: Whole numbers & decimal numbers. Can take any between 2 whole numbers.
Example: Weight of person 67.4 kg
Continuous data is divided into 2 types as below:
- Interval data: No meaningful zero, negative value possible
Example: Temperature(°C or F, but not Kelvin), Dates, time gap, etc.
- Ratio data: Do have absolute zero, no negative value possible
Example: Age, Height, Weight, length, Temperature(in Kelvin, but not °C or F), etc.
02. Qualitative Data:
- Can’t be expressed in numbers.
- Consists of words, pictures, symbols
- Also known as categorical data as sorted by category not by number
- Represented by Pie chart
- Examples: Colors – Blue, green, red, etc. Country – USA, UK, Italy etc.
Qualitative data are two types as below:
- Nominal Data: Labeling/name variables, no particular order
Example: Gender-Men, Women, etc, Eye color- Brown, Black, blue, etc.
- Ordinal Data: Categorical data but in some order. Based on the relative position we can assign numbers but cannot do math with those numbers
Example: Rank in competition: First, Second, third, etc, Rating of product: 1,2,3,4,5,6,7,8, etc.