Connecting R to An SQL Database (PostgreSQL, MySQL, Oracle)
While reading data into R isn't difficult, there are a variety of reasons why you would want R to directly pull data from a database. Whatever your situation, I will take you through the process of connecting R to that DB and querying data from it.
My favorite SQL platform is PostgreSQL because of its simplicity and versatility. In other words, I use R for most things and PostgreSQL works well with R. You can obtain it here, if needed. I suggest that you download RStudio if you do not already have it as it makes installing packages and other essential tasks easier.
The following code is essentially the same for all SQL platforms, the difference being which R package you load. For postgreSQL we use,
library(RPostgreSQL)
# for a local instance of postgres
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="localhost",port="5432")
# or to connect to a server it will look something like...
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
If you don't want your username and password to appear in the R script, you can save a spreadsheet document as a .csv file with your username and password in them and read that in. For example,
library(magrittr)
usepass<-read.csv("credentialsDocuments",header=F,stringsAsFactors=F) %>% as.matrix()
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="hostname",port="xxxx",username=usepass[1],password=usepass[2])
For MySQL, it is essentially the same, but with a different R package.
library(RMySQL)
con<-dbConnect(dbDriver("MySQL"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
Connecting to an Oracle DB can be a bit trickier because the "ROracle" package doesn't download and install like most other R packages. There are a few things that need doing before you are able to install the package.
You can install the package from source, so I downloaded it and installed it from a local directory. You can also use Rstudio to install the package and it will ask you if you would like to install from source. Select 'yes'.
With that, the ROracle package should successfully install. Your code to connect will be essentially the same as connecting to the other platforms, at this point.
***Update (June 2019): I recently installed the latest version of R (3.6.0), and I had some additional struggles trying to reinstall ROracle. Upon attempting to install from source, it had an error claiming it couldn't find a header file called "ociver.h" even though that file is in instantclient_18_3/sdk/include. The way around this was to copy that file over to R's include file (C:/program files/R/R 3.6.0/include). Copy ociver.h from instant client to that include file directory and the installation should complete successfully.
library(ROracle)
con<-dbConnect(dbDriver("Oracle"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
Querying the database will be the same regardless of SQL platform.
query<-'SELECT *
FROM mytable'
myDat<-dbGetQuery(con,query)
dbDisconnect(con)
That should get you up and running with R pulling data from a database. I hope that was helpful, and if you have any further questions, feel free to reach out to me at https://worth.yt/lazybayesian
My favorite SQL platform is PostgreSQL because of its simplicity and versatility. In other words, I use R for most things and PostgreSQL works well with R. You can obtain it here, if needed. I suggest that you download RStudio if you do not already have it as it makes installing packages and other essential tasks easier.
The following code is essentially the same for all SQL platforms, the difference being which R package you load. For postgreSQL we use,
library(RPostgreSQL)
# for a local instance of postgres
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="localhost",port="5432")
# or to connect to a server it will look something like...
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
If you don't want your username and password to appear in the R script, you can save a spreadsheet document as a .csv file with your username and password in them and read that in. For example,
library(magrittr)
usepass<-read.csv("credentialsDocuments",header=F,stringsAsFactors=F) %>% as.matrix()
con<-dbConnect(dbDriver("PostgreSQL"), dbname="myDB",host="hostname",port="xxxx",username=usepass[1],password=usepass[2])
For MySQL, it is essentially the same, but with a different R package.
library(RMySQL)
con<-dbConnect(dbDriver("MySQL"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
Connecting to an Oracle DB can be a bit trickier because the "ROracle" package doesn't download and install like most other R packages. There are a few things that need doing before you are able to install the package.
- You'll need to first install oracle instant client found here. ROracle needs instant client to work.
- Some environmental variables need to be set within the R environment so the package can find needed files when it attempts to install. Set the path to wherever you put the instant client download. Assuming you put the files on the C drive, it would look like this.
You can install the package from source, so I downloaded it and installed it from a local directory. You can also use Rstudio to install the package and it will ask you if you would like to install from source. Select 'yes'.
install.packages("file_path/ROracle_1.3-2.tar.gz",repos=NULL,type="source")
With that, the ROracle package should successfully install. Your code to connect will be essentially the same as connecting to the other platforms, at this point.
***Update (June 2019): I recently installed the latest version of R (3.6.0), and I had some additional struggles trying to reinstall ROracle. Upon attempting to install from source, it had an error claiming it couldn't find a header file called "ociver.h" even though that file is in instantclient_18_3/sdk/include. The way around this was to copy that file over to R's include file (C:/program files/R/R 3.6.0/include). Copy ociver.h from instant client to that include file directory and the installation should complete successfully.
library(ROracle)
con<-dbConnect(dbDriver("Oracle"), dbname="myDB",host="hostname",port="xxxx",username="user_name",password="myPassword")
Querying the database will be the same regardless of SQL platform.
query<-'SELECT *
FROM mytable'
myDat<-dbGetQuery(con,query)
dbDisconnect(con)
That should get you up and running with R pulling data from a database. I hope that was helpful, and if you have any further questions, feel free to reach out to me at https://worth.yt/lazybayesian
Comments
Post a Comment