Seaborn
Seaborn is a Python library based on matplotlib.
It is used for data visualization especially a high-level interface for drawing attractive and informative statistical graphics.
Familiar alias for seaborn is sns
# Import required libraries including seaborn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Getting the built-in datasets of seaborn for study purposes
sns.get_dataset_names()
Output: ['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights', 'fmri', 'geyser', 'glue', 'healthexp', 'iris', 'mpg', 'penguins', 'planets', 'seaice', 'taxis', 'tips', 'titanic']
# Loading 'iris' datasets
ds = sns.load_dataset('iris')
ds.head()
Output: sepal_length sepal_width petal_length petal_width species 0 5.1 3.5 1.4 0.2 setosa 1 4.9 3.0 1.4 0.2 setosa 2 4.7 3.2 1.3 0.2 setosa 3 4.6 3.1 1.5 0.2 setosa 4 5.0 3.6 1.4 0.2 setosa
ds.keys()
Output: Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'], dtype='object')
Let’s start Seaborn plotting-
Line plot
sns.lineplot(x='sepal_length',y='sepal_width',data=ds,hue='species')
Scatter Plot
- On the x-axis, we have the first feature.
- The y-axis we have placed the second feature.
- The ‘data’ parameter specifies the dataset we are drawing the columns from.
- The ‘hue’ parameter specifies the feature based on which the points are going to be colored.
- The ‘palette’ parameter specifies the colors to be used in the plot.
- The ‘markers’ parameter determines the shape of the points.
- The ‘style’ parameter connects markers to classes.
- The ‘s’ parameter specifies the size of the points.
- The ‘alpha’ parameter controls the opacity of the datapoints.
- We have decided to set the ‘legend’ parameter equal to False as we will make no use of it in this example.
sns.scatterplot(x='sepal_length',y='sepal_width',data=ds,hue='species',markers = [',', '^', 'P'],style = 'species',s = 100)
Use set() function
- The sns.set() function is used to set the aesthetic parameters for the plots created by Seaborn. It allows you to set the theme, color palette, and other parameters to control the overall look of your plots.
- If you do not use the sns.set() function in your Seaborn visualizations, your plots will use the default Matplotlib settings instead of the more refined and aesthetically pleasing settings provided by Seaborn. This means that your plots might not look as visually appealing or consistent with the design principles that Seaborn emphasizes.
sns.set()
sns.scatterplot(x='sepal_length',y='sepal_width',data=ds,hue='species',markers = [',', '^', 'P'],style = 'species',s = 100)
Bar plot
To use Bar plot , categorical data is needed for x-axis & numerical continuous data is needed for y-axis
It creates a plot taking a mean of a categorical column
sns.barplot(x='species',y='petal_length',data=ds)
Count Plot
- It counts the categories and returns a count of their occurrence
sns.countplot(x='species',data=ds)
Categorical plot
used to plot categorical plots
default value for kind is strip,
we can use points,bar,count for categorical estimate
we can use box, violin, boxen for categorical distribution
we can use strip, and swarm for categorical scatterplots
sns.catplot(x='species',y='petal_length',data=ds)
sns.catplot(x='species',y='petal_length',data=ds,kind='bar')
sns.catplot(x='species',y='petal_length',data=ds,kind='box')
Box Plot
It is sometimes known as box & whisker plot.
It shows the distribution of quantitative data that represents the comparison between variables.
Box plot shows the quartile of the dataset while whiskers extend to show the rest of the distribution i.e: the dots indicating the presence of outliers.
# Vertical Box Plot
sns.boxplot(x='species',y='sepal_width',data=ds)
# Horizontal box plot ( Switching x & y)
sns.boxplot(y='species',x='sepal_width',data=ds)
Violin plot
Similar to boxplot except that it provides a higher & advanced visualization and uses the kernel
Density estimation to give a better description about the data distribution
# Vertical plot
sns.violinplot(x='species',y='sepal_width',data=ds)
# Horizontal plot
sns.violinplot(y='species',x='sepal_width',data=ds)
Strip plot
- It creates scatter plot based on category
sns.stripplot(x='species',y='sepal_width',data=ds)
swarm plot
- swarmplot() function positions each point of scatter plot on the categorical axis and thereby avoids overlapping points
sns.swarmplot(x='species',y='sepal_width',data=ds)
Point Plots
- Point plots serve same as bar plots but in a different style. Rather than the full bar, the value of the estimate is represented by the point at a certain height on the other axis.
sns.pointplot(x='species',y='sepal_width',data=ds)
Histogram
It represents data provided in a form of some groups.
It is a graphical representation of numerical data distribution
sns.histplot(x='petal_width',data=ds,hue='species',kde=True) # kde=kernel density estimate
KDE Plot
Kernel Distribution Estimation(kde) Plot which depicts the probability density function of the continuous or non-parametric data variables i.e. we can plot for the univariate or multiple variables altogether.
Using the Python Seaborn module, we can build the Kdeplot with various functionality added to it.
sns.kdeplot(x='sepal_length',data=ds,hue='species')
Distribution plot
- Used for univariate analysis
- Visualize through a histogram only one observation, so one particular column should be chosen
sns.distplot(ds['petal_width'])
Joint plot
To analyze Bivariate distribution in seaborn , jointplot( ) function can be used.
Jointplot creates a multi-panel figure that projects the bivariate relationship between two variables and also the univariate distribution of each variable on separate axes.
sns.jointplot(x='petal_length',y='petal_width',data=ds)
Pairplot
- It represents a pairwise relation across the entire data frame & supports an additional argument called hue for categorical separation.
sns.pairplot(data=ds,hue='species')
Heat Map
- It is a graphical representation of data using colors to visualize the value of the matrix
More common or higher activities lighter the color
Less common or lower activities, the darker the color
It shows the correlation between different parameters
sns.heatmap(ds.corr(),annot=True)