Post

R Code Snippets

This article contains code snippets for “R” programming language. I am fairly new to “R” and documenting the code snippets helps me to understand it better. Just a word of caution, the snippets are not organized neatly. This is just for my personal reference to keep track of what concepts I learned in R.

Clear variables from work space

1
rm(list=ls())

Clear Plots

1
dev.off(dev.list()["RStudioGD"]

Basics

Installing libraries

1
install.packages("ggplot2")

Loading in library

As an example, the ggplot2 library can be loaded as follows:

1
library(ggplot2)

Printing

1
2
3
# Example of using sprintf() inside print()
x <- 10
print(sprintf("The value of x is %d", x))
1
2
3
4
5
# Example of using paste() inside print()
name <- "John"
age <- 30
print(paste("Name:", name, ", Age:", age))

Conditionals

1
2
3
4
5
6
7
8
9
# Example of an else if statement
x <- 10
if (x > 20) {
  print("x is greater than 20")
} else if (x > 10) {
  print("x is greater than 10 but less than or equal to 20")
} else {
  print("x is less than or equal to 10")
}

While loops

1
2
3
4
5
6
# Example of a while loop
i <- 1
while (i <= 5) {
  print(i)
  i <- i + 1
}

For loops

1
2
3
4
# Example of a for loop
for (i in 1:5) {
  print(i)
}
1
2
3
4
5
# Looping over elements of a vector
element_vector <- c("a", "b", "c", "d", "e")
for (element in element_vector) {
  print(element)
}

Creating a sequence of numbers given a range

1
2
3
# To create a sequence from 1 to 10 in steps of 1
sequence <- seq(1, 10, by=1)
print(sequence)

Another way to create a sequence is with :

1
2
# Create a sequence from 1 to 100
i <- 1:100

Creating vectors

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Creating empty vectors 
empty_vec <- c()

# Creating an empty vector of a specific length and type
empty_vector <- vector("numeric", length = 5)

# Creating a numeric vector
numeric_vector <- c(1, 2, 3, 4, 5)

# Creating a character vector
character_vector <- c("a", "b", "c", "d", "e")

# Creating a logical vector
logical_vector <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

Creating matrices

Creating an empty matrix filled with zeros (or any specific values)

1
mat <- matrix(0,nrow = num_rows, ncol = num_cols)

Creating matrices by stacking vectors (vertically and horizontally)

1
2
3
4
5
6
col1 <- c(1, 4, 7)
col2 <- c(2, 5, 8)
col3 <- c(3, 6, 9)

mat_cbind <- cbind(col1, col2, col3)
mat_rbind <- rbind(col1, col2, col3)

Generating samples from a probability distribution

Generating samples from a uniform distribution

Reference: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/Uniform

1
2
# Sample 20 values from a uniform distribution which ranges from -1 to 1
x <- runif(20, min = -1, max = 1)

Generating samples from a normal distribution

Reference: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/Normal

1
2
# Sample 100 sample from normal distribution with a specified mean and standard deviation
x <- rnorm(n=100,mean=68.5,sd=5.7)

Operations on dataframe

Getting a summary for the data frame

1
summary(data)

Getting the number of columns in a data frame

1
2
3
# Get the number of columns in the data frame
num_cols_df <- ncol(data_frame)
print(num_cols_df) 

Getting the number of rows in a data frame

1
2
3
# Get the number of rows in the data frame
num_rows_df <- nrow(data_frame)
print(num_rows_df) 

ggplot commands

Basic ggplot plot

Specifiy the data: In ggplot the first argument is the data frame and the second argument inside aes() specifies the columns of the data frame that is to be used as the x-axis and y-axis. If the type of plot only involves a single column (for example histogram) we only need to pass in one column as the x-axis.

1
ggplot(data,aes(x=price))

Specify the type of plot: Then we add in the geometry to specify the type of plot. We can specify additional parameters as arguments which will control the look of the plot

1
2
3
# Plot a histogram where bindwidth=50 and specify the edge and fill colors
ggplot(data,aes(x=price))+
  geom_histogram(binwidth = 50,col='#9683F5',fill='#D2CDE9')

As an example, here we are plotting a histogram with geom_histogram. There are way too many parameters to go through. The best approach is to look up the documentation when you need to implement a specific thing. https://ggplot2.tidyverse.org/reference/geom_histogram.html

Clean themes

This is kinda subjective and varies from person to person. But I normally use the following code snippets as a theme. This code snippet needs to be varied for different types of plots.

1
2
3
4
5
6
7
theme_bw()+
  theme(panel.border = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.major.y = element_line(linetype = "dashed",color = "black"),
        panel.grid.minor =element_blank(),
        panel.background = element_blank(),
        axis.line = element_line(color = "white"),

2D density contour plot with heatmap overlay

1
2
3
4
5
6
7
m<-ggplot(data, aes(x = HEIGHT, y = WEIGHT)) +
  geom_point()+
  stat_bin2d(bins=80)+
  scale_fill_gradient(low="lightblue", high="red")+

m+geom_density_2d()

Save an image with ggplot

1
2
# Saves the last plot as "plot.png" 5x5 image
ggsave("plot.png",width=5,height=5)

Statistical Concepts

Regression

Fitting a simple regression model

The following code fit a simple linear regression model of the form:

\[y= \beta_0+\beta_1x+ \epsilon\]
1
2
3
4
5
6
7
8
9
10
data <- read.csv("data.csv")

# Fit the regression model
model <- lm(y ~ x, data = data)

# Summary of the model
summary(model)

# Coefficients:
coefficients(model)

Include non-linear terms in the regression model

Non-linear regression terms can be introduced with the help of the poly() function. The following code snippet fit a regression of the following form:

\[y=\beta_0+\beta_1 x+\beta_2 x^2+ \beta_3 x^3+ \beta_4 x^4+ \beta_5 x^5\]
1
2
3
4
5
6
data <- read.csv("data.csv")

model <- lm(y ~ poly(x, 5), data = data)

# Summary 
summary(model)

Detecting multicollinearity

To detect multicollinearity in regression the variation inflation factor (VIF) can be computed. As a thumb rule if $\text{VIF}>10$, then multicollinearity exists.

1
2
3
4
5
6
7
8
9
library(car)

# Fit a regression model ....
# Compute VIF
vif_result <- vif(model)

# Print VIF values
print(vif_result)

Personalized snippets

Split continuous numerical variables in different classes

1
2
3
4
5
breaks <- c(10, 20, 40, 60, 90,Inf)
labels <- c("10-20", "21-40", "41-60", "61-80","81+")

# Create a new column with age groups
data$age_group <- cut(data$AGE, breaks = breaks, labels = labels, right = FALSE) 

In the above example code, there is a continuous age group variable. A new column is created by assigning different edge groups.

Get a random subset from a data frame

1
2
3
k=10
random_indices <- sample(nrow(data), k)
subset_dataset<- data[random_indices, ]
This post is licensed under CC BY 4.0 by the author.