DATA VISUALIZATION WITH SEABORN PART-3

You already have enough introduction to seaborn in previous posts. In the previous article, we talked about making line graphs with seaborn package for python. Now in this tutorial, we will talk about Bar Graphs & Heatmaps.

Set up the notebook for seaborn

As always, to begin visualization with seaborn, we begin by setting up the coding environment.

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")

Select a dataset

In this tutorial, we’ll work with a dataset from IGN for game ratings starting form 1 to 10. Find the dataset here

Load the data

As before, we load the dataset using the pd.read_csv command.In [2]:

# Path of the file to read
ign_filepath = "../input/ign_reviews.csv"

# Read the file into a variable flight_data
ign_data = pd.read_csv(ign_filepath, index_col="Platform")

You may notice that the code is slightly shorter than what we used in the previous tutorial. In this case, since the row labels (from the 'platform' column) is not a date column

  • the filepath for the dataset (in this case, ign_filepath), and
  • the name of the column that will be used to index the rows (in this case, index_col="Platform").

Examine the data

Since the dataset is small, we can easily print all of its contents. This is done by writing a single line of code with just the name of the dataset.In [3]:

# Print the data
ign_data

Bar chart

Say we’d like to create a bar chart showing the average score for racing games, for each platform using seaborn.

# Set the width and height of the figure 
plt.figure(figsize=(8, 6)) 

# Bar chart showing average score for racing games by platform 
sns.barplot(x=ign_data['Racing'], y=ign_data.index) 

# Add label for horizontal axis 
plt.xlabel("") 

# Add label for vertical axis 
plt.title("Average Score for Racing Games, by Platform")

The commands for customizing the text (title and vertical axis label) and size of the figure are familiar from the previous tutorial. The code that creates the bar chart is new:

# Bar chart showing average score for racing games by platform
sns.barplot(x=ign_data['Racing'], y=ign_data.index)

It has three main components:

  • sns.barplot – This tells the notebook that we want to create a bar chart.
    • Remember that sns refers to the seaborn package, and all of the commands that you use to create charts in this series of tutorial will start with this prefix.
  • x=ign_data['Racing'] – This determines what to use on the horizontal axis. In this case, we have selected the column that shows the categories of the bargraph
  • y=ign_data.index – This sets the column in the data that will be used to determine the height of each bar.

Important Note: You must select the indexing column with ign_data.index, and it is not possible to use ign_data['Platform'] (which will return an error). This is because when we loaded the dataset, the "Platform" column was used to index the rows. We always have to use this special notation to select the indexing column.

Heatmap

Now we will learn about another chart type, Heatmaps

In the code cell below, we create a heatmap to quickly visualize patterns in ign_data. Each cell is color-coded according to its corresponding value.In [5]:

# Set the width and height of the figure
plt.figure(figsize=(10,10)) 

# Heatmap showing average game score by platform and genre 
sns.heatmap(ign_data, annot=True) 

# Add label for horizontal axis 
plt.xlabel("Genre") 

# Add label for vertical axis 
plt.title("Average Game Score, by Platform and Genre")

The relevant code to create the heatmap is as follows:

# Heatmap showing average game score by platform and genre
sns.heatmap(ign_data, annot=True)

This code has three main components:

  • sns.heatmap – This tells the notebook that we want to create a heatmap.
  • data=ign_data – This tells the notebook to use all of the entries in ign_data to create the heatmap.
  • annot=True – This ensures that the values for each cell appear on the chart. (Leaving this out removes the numbers from each of the cells!)

How to read the Heatmap?

What patterns can you detect in the table? For instance, if you look closely, the The darker cells shows concentration of games under the Genre and Platform. For lighter cell, there are fewer titles.