Posts

Integrating Data Management and Data Analytics with R and postgreSQL.

Image
Those wanting to be successful in data analytics increasingly have to become well versed in managing their data. That means it is no longer sufficient to just learn R or an analytic platform. You also need to be competent with SQL or some similar database platform. As I am a huge advocate of open source applications, I will be using postgreSQL although that certainly isn't the only SQL platform that R can work with. I've got R to work with MySQL, Oracle SQL, postgreSQL, and Microsoft Server SQL. I already have a post on how to connect R to these platforms, though I don't get into Microsoft SQL Server because it is a painful (not worth it) process to do this if you are running OSX. ( https://www.lazybayesian.com/2019/05/connecting-r-to-sql-database-postgresql.html)  Y ou would end up using RODBC package or something similar to get it done but you end up needing to use homebrew to install other tools on your machine to even to get that to work, and it just keeps going. If y...

Which Game is the Scariest? Alien: Isolation, Dead Space, Dead Space 2, or Silent Hill 2? An R Halloween Analysis!

Image
I wanted to get into the halloween spirit by doing some kind of horror themed analytics post. The idea of combining R, data analytics, and the macabre isn't as straightforward as some may think (yes, that was a joke). While I don't care for horror movies, for some reason, I enjoy survival horror video games. Not playing them, of course. I'm far too squeamish for that. I usually watch youtube videos of other people playing them to spare myself a panic attack. I'm the kind of guy who would start playing the game and once the atmosphere became intense, I would just go, "NOPE", turn off the game and walk away. Of the survival horror games I've seen, the Dead Space franchise is up there. I also love the Alien franchise, though that franchise has suffered from a number of awful releases (including movies). Alien: Isolation is a gem, whose intense atmosphere makes every footstep nerve-racking. Lastly, I wanted to include another game that I haven't see...

Online Statistics Tutor: Linear Regression - Understanding and Interpreting Linear Regression

Image
Simple Linear Regression is a staple in every statistical toolbox. The idea is to estimate a linear relationship between a  dependent variable  ( Y  or your outcome) and an  independent variable  ( X  or your predictor variable). That is, we estimate the equation of a line through data points that minimizes the vertical distance of the data points to that line. From this we can better understand how X affects Y. This analysis can be used for predictive purposes, as well. In this post I plan on only addressing some basic principles about regression in order to best understand what it is and how to use it. I will focus on Scatterplots and linear relationships. Point-slope equation for a line and how it works. Estimating slope coefficients. Interpreting the slope. Brief mention of other regression concepts (which I may address in later posts).  Scatterplots and Linear Relationships If you are not already familiar with what a scatterplot is,...

Learn to Code in R: Reading in External Data Files

Image
One skill that everyone in R should have is how to read in external data files. Many people who have some exposure to R will have some familiarity with this skill, but little knowledge of the many formats R can handle. This is often because many people's exposure is from a singular class or a project they did once. My hope is to provide the reader with a broader understanding of R's ability to handle a number of data formats. In this post, I will cover, How to read in .csv, Stata, SPSS, SAS, and Excel spreadsheet files. Some formatting options and different abilities you ought to know. Some explanations regarding help documentation and using function arguments/options. Saving and loading Rdata files for minimal hassle once the data is just the way you want it. Reading in Text Files and Function Options The basic function for reading in data is read.table() . I mention this one first because the other functions for reading in external data are based off of this one. In ...

Online Statistics Tutor: Analyzing Nominal and Ordinal Data

Image
While nominal (categorical) and ordinal (rank order) data can't be used in standard introductory analyses, like  the T or F-tests, there still is a number of options when working with these kinds of data.  In this post I will point out a few of these, specifically, Producing table of counts of cross-tabs.  Chi-square test of association. Kruskall's Gamma: A correlation coefficient for ordinal data. I will provide code on how to perform each in R Let's first start with producing count tables in R. This is the most basic way to summarize nominal or ordinal data. In the code below, I've created a couple sets of nominal and ordinal values containing all available values and then sampled from them. The output from sampling from the set of 4 colors and 5 items in a likert scale are saved as "colset" and "ordset". The size argument in the sample function means that this output will be 100 elements long. To produce a table of counts for these da...

Learn to Code in R: for Loops and tapply, lapply, and sapply.

Image
Continuing on with the discussion of for loops and apply functions bring us to another set of apply functions used to, well, apply a function to data in different ways. In this post, I will be: Discussing the arrays or data arrangements for which the different apply functions are designed. That is, when to use each one. Comparing for loops to tapply, lapply, and sapply. I will write for loops for each so you can better familiarize yourself with for loops and situations where you can use the apply functions, instead. The data I will be using for this is the same data set that I used for the apply function post . This is some code I used to prepare the data to get it to its current state. Some of which I will be discussing later. I mostly provide this for the sake of disclosure and clarity. lapply and sapply: Apply a function over a Vector or List This is the most apparent and obvious replacement to a for loop. You give lapply the information set that you wish to iterat...

Learn to Code in R: Introduction to R and Basic Concepts.

Image
There are many options when it comes to statistical computing, but R is freely available, powerful, robust, and always getting better. Most statistical software packages have exorbitant costs associated with obtaining personal or group licenses. But with R, you get an extremely powerful software package that is just as good, if not better, for no cost! This software is ever-improving and growing thanks to the many people who contribute to this project and make this all possible. This post is designed to be a first time exposure to R for those with no experience and want to start learning how to code. Whether you are a student in a stats course trying to learn or are trying to acquire a little R know-how in order to expand you business intelligence skills, this post is designed to help people get started. In this post, I will be giving you a basic knowledge of R skills so you can start doing simple analyses quickly. Specifically, I will be covering How to acquire R and Rstudio. Rs...