Seaborn Tutorial




What is Seaborn?


Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

Import Seaborn and Dataset

import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt # required to change the plot styles 

plt.style.use('ggplot')

# import build-in seaborn dataset
tips = sns.load_dataset('tips')
tips.head()

"""
	total_bill	tip	    sex	    smoker	day	time	size
0	16.99	    1.01	Female	No	    Sun	Dinner	2
1	10.34	    1.66	Male	No	    Sun	Dinner	3
2	21.01	    3.50	Male	No	    Sun	Dinner	3
3	23.68	    3.31	Male	No	    Sun	Dinner	2
4	24.59	    3.61	Female	No	    Sun	Dinner	4
"""

Distribution Plots

sns.distplot(tips["total_bill"])

distplot

# jointplot() allows you to basically match up two distplots for bivariate data.
# With your choice of what kind parameter to compare with:
# "scatter”, “reg”, “resid”, “kde”, “hex”
sns.jointplot(x='total_bill', y='tip', data=tips)

jointplot

# pairplot will plot pairwise relationships across an entire dataframe
# (for the numerical columns) and supports a color hue argument (for categorical columns).
sns.pairplot(tips, hue='sex')

pairplot

# Rug and kde on one plot
sns.kdeplot(tips['tip'])
sns.rugplot(tips['tip'])

rug_kde

Categorical Plots


# barplot is a general plot that allows you to aggregate the
# categorical data based off some function, by default the mean:
sns.barplot(x='sex',y='total_bill',data=tips)

barplot

# countplot
# This is essentially the same as barplot except the estimator
# is explicitly counting the number of occurrences.
# Which is why we only pass the x value:
sns.countplot(x='sex',data=tips)

countplot

boxplot and violinplot


boxplots and violinplots are used to shown the distribution of categorical data. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
sns.boxplot(x="day", y="total_bill", data=tips,palette='rainbow')

boxplot

sns.boxplot(x="day", y="total_bill", hue="smoker",data=tips, palette="coolwarm")

boxplot2

sns.violinplot(x="day", y="total_bill", data=tips,palette='rainbow')

violinplot

sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',palette='Set1')

violinplot

stripplot and swarmplot


The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

sns.stripplot(x="day", y="total_bill", data=tips)

stripplot1

sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)

stripplot2

sns.swarmplot(x="day", y="total_bill", data=tips)

swarmplot

Combining Categorical Plots

sns.violinplot(x='day', y='total_bill', data=tips)
sns.swarmplot(x='day', y='total_bill', data=tips, color='black')

combinedplots

Factorplot


factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type:

sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')

factorplot

Matric Plots

Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data.

import seaborn as sns
%matplotlib inline

# Import the build-in flights dataset
flights = sns.load_dataset('flights')
flights = sns.load_dataset('flights')
flights.head()

"""
    year	month	    passengers
0	1949	January	    112
1	1949	February	118
2	1949	March	    132
3	1949	April	    129
4	1949	May	        121
"""

Heatmap


In order for a heatmap to work properly, your data should already be in a matrix form, the sns.heatmap function basically just colors it in for you.

flights.pivot_table(values='passengers',index='month',columns='year')

pivittable

pvflights = flights.pivot_table(values='passengers',index='month',columns='year')
sns.heatmap(pvflights)

heatmap

sns.heatmap(pvflights,cmap='magma',linecolor='white',linewidths=1)

heatmap

Clustermap


The clustermap uses hierarchal clustering to produce a clustered version of the heatmap. For example:

sns.clustermap(pvflights)

clustermap