Data Visualization with seaborn part-1
In today’s post, I will be introducing seaborn, a charting library for Python for making eye catching graphs. Data Visualization is an import tool in Data Science. With help of Visualization, you get the feel about what the data actually looks like on a scale. In this article, we will further try to explore some of the feature of seaborn
to install seaborn library, you need to enter the following code in terminal , or windows Power Shell.
pip install seaborn
Using Jupyter notebooks
Once you have installed seaborn, you can call the library in a Jupyter notebook.
Set up the notebook
There are a few libraries that you need to load in your Jupyter notebook, hereinafter referred to as notebook. You will to run the following code in order to load the necessary libraries (Notice that it returns as output:
Setup Complete.)In :
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns print("Setup Complete")
Load the data
In this tutorial, we’ll work with a dataset of historical FIFA rankings for six countries: Argentina (ARG), Brazil (BRA), Spain (ESP), France (FRA), Germany (GER), and Italy (ITA). The dataset is stored as a CSV file (short for comma-separated values file.
To load the data into the notebook, we’ll be using the Pandas read_csv functionality
- begin by specifying the location (or filepath) where the dataset can be accessed, and then
- use the filepath to load the contents of the dataset into the notebook.
# Path of the file to read fifa_filepath = "../input/fifaratings.csv" # Read the file into a variable fifa_data fifa_data = pd.read_csv(fifa_filepath, index_col="Date", parse_dates=True)
Note that the code cell above has four different lines.
Lines beginning with #
Two of the lines are preceded by a (
#) and contain text that appears faded and italicized.
Both of these lines are completely ignored by the computer when the code is run, and they only appear here so that any human who reads the code can quickly understand it. We refer to these two lines as comments, and it’s good practice to include them to make sure that your code is readily interpret able.
The other two lines are executable code, or code that is run by the computer (in this case, to find and load the dataset).
The first line sets the value of
fifa_filepath to the location where the dataset can be accessed. In this case, we’ve provided the filepath for you (in quotation marks). Note that the comment immediately above this line of executable code provides a quick description of what it does!
The second line sets the value of
fifa_data to contain all of the information in the dataset. This is done with
pd.read_csv. a detailed tutorial on how to do this can be found here
fifa_filepath– The filepath for the dataset always needs to be provided first.
index_col="Date"– When we load the dataset, we want each entry in the first column to denote a different row. To do this, we set the value of
index_colto the name of the first column (
"Date", found in cell A1 of the file when it’s opened in Excel).
parse_dates=True– This tells the notebook to understand the each row label as a date (as opposed to a number or other text with a different meaning).
These details will make more sense soon, when you have a chance to load your own dataset
By the way, you might have noticed that these lines of code don’t have any output (whereas the lines of code you ran earlier in the notebook returned
Setup Complete as output). This is expected behavior — not all code will return output, and this code is a prime example!
Examine the data
Now, we’ll take a quick look at the dataset in
fifa_data, to make sure that it loaded properly.
We will first use .head() command to return the first 5 results in dataset
- begin with the variable containing the dataset (in this case,
fifa_data), and then
- follow it with
You can see this in the line of code below.In :
Check now that the first five rows agree with the image of the dataset (from when we saw what it would look like in Excel) above.
Plotting the data
In python, making graphs and charts is referred to as plots. You will find the term plotting data all the time. it simply means to generate graphs of the data.
Check out the following simple code that defines our graph.
# Set the width and height of the figure plt.figure(figsize=(16,6)) # Line chart showing how FIFA rankings evolved over time sns.lineplot(data=fifa_data)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7adde8add8>
This code isn’t making much sense, is it? Well, hold your horses because the fun is just about to begin.