**Definition of Statistics**:

**Types of Statistics :**

**01. Descriptive Statistics:** Methods of organizing, summarizing, and presenting data in an informative way.

**02. Inferential Statistics:** Methods used to reach a conclusion about the population on the basis of a sample.

**Example:-**

**Applications of Statistical Concepts:**

- Finance – Correlation, regression, time series analysis
- Marketing – Hypothesis testing, chi-square test, non-parametric testing
- Academic Research – Hypothesis testing, chi-square test, non-parametric testing
- Operating management – Hypothesis testing, ANOVA, time series analysis
- Retailing – Sales data, distribution analysis, instore promotion, new product development

**Basic Terms of Statistics:**

**Population**: A collection or group of individuals objects or events whose properties are to be analyzed. Represented by

**N**.

**Sample**: A subset of the population. Represented by

**n**.

**Variable**: A characteristic about each individual element of a population or sample. It can take on many variables.

**Data:**The observed values of the variable. Data may be singular or plural.

**Singular data:**Value of variable associated with one element of a population or sample. This may be a number, symbol, or a word

**Plural data:**Set of values collected for the variable from each of the elements belonging to the sample

**Parameter:**A parameter is a number describing a whole population.

**Statistic:**A statistic is a number describing a sample.

**Cochran’s Formula :**

The Cochran’s formula allows to calculate an ideal **sample size** given a desired level of precision, desired confidence level, and the estimated proportion of the attribute present in the population.

The Cochran formula is: n_{0} = Z^{2}pq / e^{2}

Where:

- e is the desired level of precision (i.e. the margin of error),
- p is the (estimated) proportion of the population that has the attribute in question,
- q is 1 – p.
- z-value is found in a Z table.

Example:

Suppose we are doing a study on the inhabitants of a large town, and want to find out **how many households serve breakfast** in the mornings.

We don’t have much information on the subject to begin with, so we’re going to assume that **half of the families** serve breakfast: this gives us maximum variability.

So **p = 0.5** Now let’s say we want **95% confidence**, and at least 5 percent—plus or minus—precision. A 95 % confidence level gives us **Z values of 1.96**, per the normal tables, so we get

**=((1.96)2 (0.5) (0.5)) / (0.05)2 = 385**

So a random sample of 385 households in our target population should be enough to give us the confidence levels we need.

**Types of variable:**

**Qualitative or Attribute or Categorical variable –**Non-numeric characteristic.

Example: Gender, eye color, hair color, Country name, Types of flowers, etc…

**Quantitative variable –**Numeric characteristic.

Example: height, weight, No of Children, etc….

Quantitative variables are classified as below:

**Discrete variable:**contains values which are whole numbers.

Example: No of Children – 2,3,5 etc,

**Continuous variable:**Contain values which are whole numbers & decimal numbers also.

Example: Height – 175.25 cm,180.5 cm etc…

**Data Types in Statistics:**

While doing Exploratory Data Analysis in data science project, we should have a good understanding of different data types since certain statistical measurements are only for specific data types. It is also known as a **measurement scale.**

Also, we need to know which visualization method fits the particular data type.

Data types are divided into below two categories:

**01. Quantitative Data:**

- Expressed as a number & measured by numerical variables.
- Represented by
**line graphs**,**bar graphs**,**scatter plots,**etc. - Examples: Exam score-74,76,98 etc Weight – 85.2 kg, 56 kg, etc

Quantitative data are 2 types as below:

**Discrete data:**Only whole or integer numbers. Can not divided into smaller parts.

Example: No of students 25

**Continuous data:**Whole numbers & decimal numbers. Can take any between 2 whole numbers.

Example: Weight of person 67.4 kg

Continuous data is divided into 2 types as below:

**Interval data:**No meaningful zero, negative value possible

Example: Temperature(°C or F, but not Kelvin), Dates, time gap, etc.

**Ratio data:**Do have absolute zero, no negative value possible

Example: Age, Height, Weight, length, Temperature(in Kelvin, but not °C or F), etc.

**02. Qualitative Data:**

- Can’t be expressed in numbers.
- Consists of words, pictures, symbols
- Also known as categorical data as sorted by category not by number
- Represented by
**Pie chart** - Examples: Colors – Blue, green, red, etc. Country – USA, UK, Italy etc.

Qualitative data are two types as below:

**Nominal Data:**Labeling/name variables, no particular order

Example: Gender-Men, Women, etc, Eye color- Brown, Black, blue, etc.

**Ordinal Data:**Categorical data but in some order. Based on the relative position we can assign numbers but cannot do math with those numbers

Example: Rank in competition: First, Second, third, etc, Rating of product: 1,2,3,4,5,6,7,8, etc.