In the last post, we saw an introduction to R and discussed about its uses. Since the aim here is to be a guide for absolute beginner. I will start by setting up the environment before we move further. We will also discuss some basic functionality on the way for getting started with R
R-Base or R Studio?
as explained earlier, nearly everything could be done in R base package than can be done in R Studio. We will be working with R Studio, as it is more user friendly and easy to adopt.
getting started with R Studio
Once you launch R Studio, you will notice that the default screen is split in 4 parts
- The notebook section where you can write the script that can be saved for future references. The code in notebook can be exported to share the analysis or export to make reports. The code is executed only when it is run.
- The environment section, where you can see all the loaded datasets and their variables.
- Console Pane, the code you write in notebook section will be executed in real time here. You can write the code directly here to run but it is not saved.
- Package Manager, here you can see a list of all the packages installed. and can load the packages by selecting the checkbox next to them.
You must have noticed by now that there are tabs above the each pane. These can be used to get different input / output options.
Setting the Theme
Once you are getting started with R Studio, you might want to customize the look and feel of the script to match your own style. Now, most of you will be wondering why customize the theme when the default one is good enough. Anyways, its a matter of personal choice. Like I feel conformable with a dark look. you might find a bright white environment. so here goes
go to Tools, and then click global options
in the options windows that opens, you will select Appearance option. here you will see that all the design and font related options are available for customization.
Installing the Packages
as I said earlier, with R studio, a lot of tasks are simple point and click tasks. In order to install any package simple click on Install in package manager pane.
and enter the package name, it will search out and show similar naming packages for you to choose.
in order to install multiple packages, simple enter new package name after a comma.
Make sure to check the checkbox for install dependencies to automatically install dependencies for the package.
once you click install, you will notice in the console pane, the script will be entered and executed. Here it is noteworthy that it also list the other packages the are going to be installed as dependencies.
on the top right corner of the console window, a red light appears when the code is being run.
this indicates that the code is being executed and we need to wait for it to complete. Also, there is a broom next to red light. it is used to clear the contents of the console pane.
once the installation is complete, you will get the prompt back and the red line gone.
Another way to install package
you can directly write the following command to console window to install the packages. This will not suggest you package names and you need to know the exact package name.
the above method is also applicable when using Base R.
Loading the Package
You can click on the check box next to the package in package manager. or run the below code
running the above command will load the readr package in environment.
please note that you need to load the package every time you restart R before you can use them.
once you have entered the code in R Notebook, you need to execute the code. This can be done by highlighting the code you have written and sending the key combination
CTRL + Enter
or if you need to execute whole notebook, you can also click “Run”
Loading the Data
now that you know the basic elements of R. lets try loading some data. One change you will no notice from python is that the data loaded in R is already data frames. You don’t need Pandas here. It is built into R
The data can be imported using command line or using the data import wizards in R studio.
You need to click on import data set in the environment section
Here you can see various options for importing different types of data types. If you need to import any other data type. There is always a package for that.
for now we will be using the readr package for importing the data. Click on From Text(readr)
You will notice, that it gives you easy to use import option. It also provides some pre-processing options such as
- First row as column name
- Delimiter used in data, e.g. commas or tabs
- Setting the name of imported data set.
On the below right side, you get the preview of code that will be executed.
once you click import the code will run and preview of data is shown.
in the screenshot above, you will notice the following
- A preview of data set imported.
- the code execution for importing the data.
- environment showing the list of datasets loaded.
Getting the First 5 and last 5 record from a data set
similar to Python you can use the head and tail commands
you must have noticed that the command works in same way as in Python but only difference is that in Python you will run the commands as
That’s all for today for getting started with R. Please stay tuned for more exciting stuff.