Posts

Learn to Code in R: Introduction to R and Basic Concepts.

Image
There are many options when it comes to statistical computing, but R is freely available, powerful, robust, and always getting better. Most statistical software packages have exorbitant costs associated with obtaining personal or group licenses. But with R, you get an extremely powerful software package that is just as good, if not better, for no cost! This software is ever-improving and growing thanks to the many people who contribute to this project and make this all possible. This post is designed to be a first time exposure to R for those with no experience and want to start learning how to code. Whether you are a student in a stats course trying to learn or are trying to acquire a little R know-how in order to expand you business intelligence skills, this post is designed to help people get started. In this post, I will be giving you a basic knowledge of R skills so you can start doing simple analyses quickly. Specifically, I will be covering How to acquire R and Rstudio. Rs...

Learn to Code in R: For Loops, and Apply Function.

Image
When analyzing data you often have to iterate through a set of values, or apply the same function to to that set. While R does not have a great reputation for iterative processes, the apply functions are a way around writing a slow for loop. Mastering the use of the apply functions will make your coding much more efficient and versatile. In this post, I will discuss the following: for loops. When and how to use them. I'll also briefly mention while loops. How to use apply . I will address  tapply ,  lapply , and  sapply in a subsequent post. To help demonstrate how the apply function can be used instead of a for loop, I will carry out the same task using both methods. Before I do that, I am going to go over some looping basics for those who may be unfamiliar or may need a review. For Loop Basics A for loop iterates through the elements of a vector (a set of values), where at each iteration will be represented by the object provided in the for statement. ...

Brave Browser: An Innovative New Online Marketing Paradigm

Image
While the majority of my posts are about analyses of one kind or another, I occasionally come across a new way of thinking or some new endeavor that I feel represents worthwhile knowledge in its own right. One topic that has fascinated me for some time is the idea of cryptocurrency and the proliferation of alt-coins (crypto that isn't Bitcoin). One day while I was looking around to see what kind of crypto was out there and what may be worth a look, and I stumbled upon a digital currency call BAT (Basic Attention Token). What does a cryptocurrency have to do with a new way that people interact with online marketing, you ask? Well, this crypto is a digital token that is used as the means of rewarding people for their attention, or in other words, viewing an ad. This idea alone turns our typical experience with online advertising on its head. Let's describe the typical paradigm of online ads and then compare that to this new way of doing things. The typical online ad parad...

Analyzing Text and Sentiment Analysis in R: Amazon Product Review Example

Image
Data analysts don't always have the luxury of having numerical data to analyze. Many times data comes in the form of open text. For example, consumer product reviews or feedback, and comment threads through online merchants or CRM (customer relationship management, e.g. salesforce) portals can all be open text. It's no simple task turning open text into usable information. Word clouds are one way of approaching this task by highlighting superlative terms. There are a number of word cloud libraries in R, my favorite being "wordcloud2". It outputs an html document that allows you to hover over cloud terms to see its frequency. I mention this because word clouds are so common, however, I won't be spending any more time on this post about them. In this post I'll be discussing the following: A very brief discussion about extracting online data using 'rvest'. Basic options for cleaning text data. The polarity function from the qdap package. We live...

Network Analysis in R: Visualizing Network Dynamics

Image
Network analysis is just a moniker for graphically describing network relationships. Whether you are a health official trying to describe the spread of communicable diseases or a business analyst describing the progress of a sales campaign or incentive, network analysis helps others view and better understand a network dynamic. You will need to download the 'network' package for this. In this post I will be doing the following: Provide a simple made up example to understand what network analysis is. Expand upon the simple example by adding hyper edges, different shapes and colors, and changing labels for vertices and edges to convey additional information. Provide R code with explanations of how to generate these graphics. Let's begin with a quick example so it is clear what network analysis is. At its simplest, a network analysis is a graphical depiction of the movement of some unit among various entities. In the above graphic, I have nine entities with the arr...

Online Statistics Tutor: Introduction to Hypothesis Testing - Understanding and Interpreting Statistical Hypothesis Tests

Image
Regardless of the statistical test that you are using, the process of rejecting or retaining a null hypothesis can be confusing for many. I'm not going to target any one hypothesis test, rather discuss the general logic. My intention with this post is to provide students of introductory statistics courses (or anyone attempting to learn these concepts) some additional insight into how to understand and interpret hypothesis tests. Whether you are conducting a t-test, F-test, chi-square, or are testing regression coefficients from a model, the general idea behind it all is the same. All statistical hypothesis tests follow the same general approach of testing the scenario of the null hypothesis. That is, there is no association or detectable effect with your outcome variable, also known as the dependent variable. The alternative hypothesis is usually the research hypothesis, e.g. soda affects obesity, or excessive exposure to business meetings is associated with reduced brain funct...

Online Statistics Tutor: Normal Confidence Intervals - Beginnings of Statistical Uncertainty

Image
This online statistics tutor lesson is intended to supplement introductory statistics material as additional instruction and review. In this lesson we will only be covering beginning concepts regarding confidence intervals around an estimated mean. Estimating confidence intervals uses essentially the same principles and concepts used for calculating z-scores and normal probabilities (at least for CIs for means estimated from normal data). If you need a refresher regarding these concepts, check out one of my other posts . Uncertainty in Research and Statistics Though many are reluctant to admit it, there is a great deal of uncertainty in the information that we consume. Information sources (including legitimate sources) boast new conclusions about the world around us from healthy eating and everyday behavior to climate change and astrophysics. Something that many media sources often glaze over is that NONE of them are 100% sure about their hypothesized conclusion . These conclusion...