R
RODBC

RODBC

Getting the data from a system to your analysis tool is as important as the tool itself. Some software have built in functionality and some can be extended with use of connectors. Today, I am going to write about a package in R called RODBC. as the name implies, it is used for connecting ODBC data sources to R for data analysis.

so what is OBC anyways?

ODBC stands for the Open Database Connectivity (ODBC) interface designed by Microsoft which allows applications to access data in DataBase Management Systems (DBMS) using Structured Query Language (SQL) as a standard for accessing the data. Examples of DBMS includes MS Access, SAP HANA, Oracle DB, Microsoft SQL Server, etc. ODBC permits maximum interoperability, meaning a single application can access various DBMS using same structure.

Installation

The package is available in CRAN and can be installed using the standard commands

install.packages(“RODBC”)

Creating the connection to Data Source

In order to connect to data source, you need to define the location and login parameters, and save it as a variable

con <- odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/mydatasource.mdb")

The elements of above script are as follow

  • con , it is the name of the variable for storing credentials
  • odbcDriverConnect , signifies that the driver selection for making connection to DBMS
  • Driver – {Mircosoft Access Driver (*.mdb, *.accdb)} to select the Microsoft access driver for accessing the microsoft access files.
  • DBQ= , location of the database file

or

ch <- odbcConnect(dsn = "PBH",uid = "", pwd = "" )

where

  • ch, it is the name of variable for storing credentials
  • odbcConnect , refers to connection for installed ODBC sources
  • dsn, Data Source Name is the name of data source you have in ODBC sources
  • uid, UserID for the data source
  • pwd, Password for the data source

if you don’t how to check ODBC data sources in system, click here

Importing the table

the table from the database can then be imported and saved as a data frame as follow

df<-sqlFetch(con, 'myTable')

the elements of above code are

  • df, the anme of the data frame for storing the results
  • sqlFetch, telling the package to fetch the data
  • con, the connection variable we created in first step above
  • ‘myTable’ , the table name from the access file

running the above script will save the data from table to R data frame called df.

Saving Data from SQL query

the data can be saved directly from running SQL query on the database tables, this is ideal to fetch on certain records with data of interest as Data bases can span upto several Tera Bytes of data. Only getting the getting of interest is vital

data <- sqlQuery(ch,'SELECT "Name","Age","City"
                     FROM "myTable"
                     GROUP BY "City";')

where

  • data, is the name of data frame for storing the result of query
  • sqlQuery, instructs R to execute SQL query
  • ch, the data source name variable we defined above
  • SELECT, SQL statement for selecting the columns by their names
  • FROM, table name with the data in it for which we defined the column names in select statement
  • GROUP BY, to define the grouping for data being imported

There you have it, a quick tutorial for getting data into R using ODBC. for complete details, the information can be accessed on CRAN for RODBC

Leave a Reply

Your email address will not be published. Required fields are marked *