Data Visualization with seaborn part-1

Home | Seahorse Therapy
Image just for reference, cause its the first thing that comes to my mind, when I think seaborn.

In today’s post, I will be introducing seaborn, a charting library for Python for making eye catching graphs. Data Visualization is an import tool in Data Science. With help of Visualization, you get the feel about what the data actually looks like on a scale. In this article, we will further try to explore some of the feature of seaborn

Installing seaborn

to install seaborn library, you need to enter the following code in terminal , or windows Power Shell.

pip install seaborn

tut1_plots_you_make

Using Jupyter notebooks

Once you have installed seaborn, you can call the library in a Jupyter notebook.

Set up the notebook

There are a few libraries that you need to load in your Jupyter notebook, hereinafter referred to as notebook. You will to run the following code in order to load the necessary libraries (Notice that it returns as output: Setup Complete.)In [1]:

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")
Setup Complete

Load the data

In this tutorial, we’ll work with a dataset of historical FIFA rankings for six countries: Argentina (ARG), Brazil (BRA), Spain (ESP), France (FRA), Germany (GER), and Italy (ITA). The dataset is stored as a CSV file (short for comma-separated values file.

To load the data into the notebook, we’ll be using the Pandas read_csv functionality

  • begin by specifying the location (or filepath) where the dataset can be accessed, and then
  • use the filepath to load the contents of the dataset into the notebook.

# Path of the file to read
fifa_filepath = "../input/fifaratings.csv"

# Read the file into a variable fifa_data
fifa_data = pd.read_csv(fifa_filepath, index_col="Date", parse_dates=True)

Note that the code cell above has four different lines.

Lines beginning with #

Two of the lines are preceded by a (#) and contain text that appears faded and italicized.

Both of these lines are completely ignored by the computer when the code is run, and they only appear here so that any human who reads the code can quickly understand it. We refer to these two lines as comments, and it’s good practice to include them to make sure that your code is readily interpret able.

Executable code

The other two lines are executable code, or code that is run by the computer (in this case, to find and load the dataset).

The first line sets the value of fifa_filepath to the location where the dataset can be accessed. In this case, we’ve provided the filepath for you (in quotation marks). Note that the comment immediately above this line of executable code provides a quick description of what it does!

The second line sets the value of fifa_data to contain all of the information in the dataset. This is done with pd.read_csv. a detailed tutorial on how to do this can be found here

  • fifa_filepath – The filepath for the dataset always needs to be provided first.
  • index_col="Date" – When we load the dataset, we want each entry in the first column to denote a different row. To do this, we set the value of index_col to the name of the first column ("Date", found in cell A1 of the file when it’s opened in Excel).
  • parse_dates=True – This tells the notebook to understand the each row label as a date (as opposed to a number or other text with a different meaning).

These details will make more sense soon, when you have a chance to load your own dataset

By the way, you might have noticed that these lines of code don’t have any output (whereas the lines of code you ran earlier in the notebook returned Setup Complete as output). This is expected behavior — not all code will return output, and this code is a prime example!

Examine the data

Now, we’ll take a quick look at the dataset in fifa_data, to make sure that it loaded properly.

We will first use .head() command to return the first 5 results in dataset

  • begin with the variable containing the dataset (in this case, fifa_data), and then
  • follow it with .head().

You can see this in the line of code below.In [3]:

# Prints the first 5 rows of the data
fifa_data.head()

Out[3]:

ARGBRAESPFRAGERITA
Date
1993-08-085.08.013.012.01.02.0
1993-09-2312.01.014.07.05.02.0
1993-10-229.01.07.014.04.03.0
1993-11-199.04.07.015.03.01.0
1993-12-238.03.05.015.01.02.0

Check now that the first five rows agree with the image of the dataset (from when we saw what it would look like in Excel) above.

Plotting the data

In python, making graphs and charts is referred to as plots. You will find the term plotting data all the time. it simply means to generate graphs of the data.

Check out the following simple code that defines our graph.

# Set the width and height of the figure
plt.figure(figsize=(16,6))

# Line chart showing how FIFA rankings evolved over time 
sns.lineplot(data=fifa_data)

Out[4]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f7adde8add8>

This code isn’t making much sense, is it? Well, hold your horses because the fun is just about to begin.

1 thought on “Data Visualization with seaborn part-1

Leave a Reply

Your email address will not be published. Required fields are marked *