‘R’ is a programming language for data analysis and statistics. It is free, and very widely used by professional statisticians, academics and data science enthusiasts. It is also very popular in certain application areas, including bioinformatics. It has many built-in functions and libraries, and is extensible, allowing users to define their own functions and procedures using R, C or Fortran. It also has a simple object system. so for an introduction to R i have tried to produce the following content.
In the field of data science, both R and Python are used. Some prefers to use R while some prefers Python. It is totally up to you to use which one you like. from here on, I will start an Introduction to R which will then grow into many tutorials over time covering the data handling capabilities of R.
so what is R?
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. Among other things it has
- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either directly at the computer or on hardcopy, and
- a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.
R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. However, most programs written in R are essentially ephemeral, written for a single piece of data analysis.
R and Statistics
Our introduction to the R environment did not mention statistics, yet many people use R as a statistics system. We prefer to think of it of an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN (The Comprehensive R Archive Network) family of Internet sites (via https://CRAN.R-project.org) and elsewhere. More details on packages are given later (see Chapter 13 [Packages], page 77).
Most classical statistics and much of the latest methodology is available for use with R, but users may need to be prepared to do a little work to find it.
There is an important difference in philosophy between S (and hence R) and the other main statistical systems. In S a statistical analysis is normally done as a series of steps, with intermediate results being stored in objects. Thus, whereas SAS and SPSS will give copious output from a regression or discriminant analysis, R will give minimal output and store the results in a fit object for subsequent interrogation by further R functions.
The Latest version of R base can be obtained from https://cran.r-project.org/. the installation is pretty much straight forward just like any other software. At the time of writing this article, the latest version or R is 4.0.1
to get started and see what R is capable of, R provides a demo command that can provide you with a demo of some of its base capabilites which can be expanded with packages. To see the demo try the following
You will be greeted with a menu showing different demos available
You can try the demo command by providing the name of the demo within as follows
You will notice that the Base-R you download from CRAN is very much like terminal. Where you need to enter text commands to perform tasks. Lucky for us, a very popular IDE environment called R Studio is available. R Studio makes working with R a lot easier by providing some point and click methods on basic tasks.
Note: You need to have Base R installed before you can use R Studio.
you can get R Studio from their official website at https://rstudio.com/products/rstudio/download/
You will notice that while R is Free and Open Source but R Studio is not. they do offer an open source version Free for personal use. Once you get going with R, you will fell in love with R Studio as well.
Just like python has packages for adding functionality to python. R has the libraries. there are 15,842 packages for various tasks in CRAN the official repository for R Packages
Installing the Package
In order to install packages, you will need to know the package name you want to install. a complete list of packages is maintained at CRAN. don’t worry about the package name for now, I will keep on advising the packages we are going to use in upcoming tutorials. so in order to install the packages, you will need the following
In the above example, the package name is within brackets, you can install multiple packages at once by providing a comma in between the package name and putting the second package name within in “”. e.g. (“dplyr”,”ggplot2″). Installing the package will also install its dependencies. so you don’t need to go and install them separately.
Loading a Package
the packages needs to be loaded before they can be used. We need to load the packages as follows.
You will notice that just like installing the package dependencies, loading a package will also load the dependencies. (a dependency is a package which in itself is an independent package but provides some code the main package to work)
We will get into more details in our future posts. You must have noticed that the interface is simple and organized and very much similar to Python. please do read the net post about getting started with R. Please stay tuned for more details on R.