II. Actions

A. Assignment

The most basic action that you will perform in R is “assignment”. The assignment operator is <- or =. Assignment takes what is on the right side of the assignment operator and “stores” it into the a “variable” that is on the right side of the operator. x <- 1 means that x will function as 1 until you change it.

x <- 1
x #You can print the information stored in any variable by simply entering that variable into the command prompt
## [1] 1
x + 1
## [1] 2
x + 2
## [1] 3
x - 1
## [1] 0
x <- 3
x + 1
## [1] 4
x * 2
## [1] 6
x / 2
## [1] 1.5
y <- 2
y + x
## [1] 5
z <- y + x
z + 2
## [1] 7

You can also store categorical data into a variable:

x <- "Hello world"
x
## [1] "Hello world"

It is also possible to store multiple numbers into a variable.

x <- 1:3
x
## [1] 1 2 3
x + 2
## [1] 3 4 5

Variable names

We’ll spend more time on what types of elements can be stored in variables, but for now, let’s discuss the variable name itself. So far, we’ve been using x and y a lot, but those are definitely not the only variable names you can use. A variable name can include any letter in any arrangement.

variable <- 1
elbairav <- 2
v <- 3
variablevariable <- 4

But, be careful, R is case sensitive.

variable <- 1
VARIABLE <- 2
variable
## [1] 1
casematters <- "hello"
CaseMatters <- "world"
CaseMatters == casematters
## [1] FALSE

You can also use numbers and the special characters . and _.

variable1 <- 1
variable2 <- 2
variable.name <- "Howdy Earth"
variable_name <- "Hallo Welt"

Variable names can start with uppercase or lowercase letters but cannot start with numbers or the underscore _.

1x <- 1
## Error: <text>:1:2: unexpected symbol
## 1: 1x
##      ^
_variable <- 1
## Error: <text>:1:2: unexpected symbol
## 1: _variable
##      ^

More info!

Note: R does allow variable names to start with . as long as it’s not followed by a number.

.x <- 1
._x <- 2
.2x <- 1
## Error: <text>:1:3: unexpected symbol
## 1: .2x
##       ^

However, variables starting with . are hidden.

## this function prints all of the variables you have in your environment. 
ls()
##  [1] "casematters"      "CaseMatters"      "elbairav"         "v"               
##  [5] "variable"         "VARIABLE"         "variable_name"    "variable.name"   
##  [9] "variable1"        "variable2"        "variablevariable" "x"               
## [13] "y"                "z"

At this stage, just stick to letters for variable names.

So, how do you choose what to name a variable? This is actually a bit trickier than you would think. R doesn’t care. R is happy with whatever you use (as long as it follows the previously mentioned rules). But remember that R is picky and doesn’t understand things the way humans do. For example, R is perfectly happy to let you do something insane like:

why_is_hello_world_the_phrase_that_is_always_in_intro_to_programming_lessons <- "Hello world"
why_is_hello_world_the_phrase_that_is_always_in_intro_to_programming_lessons
## [1] "Hello world"

Or:

numeric_variable <- "character_string"
is.numeric(numeric_variable)
## [1] FALSE

Or:

character_variable <- 2
wordVariable <- 3

integer.variable <- character_variable/wordVariable
integer.variable
## [1] 0.6666667
is.integer(integer.variable)
## [1] FALSE

R just doesn’t care. But humans do. And humans will read your code (at least one human, you). So variables names like those above work perfectly fine in R, but are hell if you are trying to figure out what is going on. Your code will break, it will produce unexpected results, and you will forget what certain things are doing.

Reserved keywords

R has a list of names that have special purposes, these are off-limits to use as a variable name. You can access the full list with ?reserved

?reserved

B. Functions

What if we need to perform some function like finding the mean or standard deviation or even converting Fahrenheit to Celsius?

### 3 temperature converting functions!
 # Note: R doesn't care that Celsius is misspelled. As long as your variable names are consistent, R is happy. This is good news for people who are bad spellers. Well, as long as you are consistent in how you misspell words. 

# F to C
fahrenheit_to_celcius <- function(fahrenheit){
  (fahrenheit - 32) * 5/9 # R follows the conventional order of operations: Parentheses, Exponents, Multiplication/Division, Addition/Subtraction
}

# C to F
celcius_to_fahrenheit <- function(celcius){
  celcius * (9/5) + 32
}

# Both
temp_converter <- function(input_temperature = 32, output_scale = "Fahrenheit"){ # 
  if (output_scale == "Fahrenheit") {
      input_temperature * (9/5) + 32  
  } else if (output_scale == "Celcius") {
      (input_temperature - 32) * 5/9
  } else {
    errorCondition("Did you mispell Celcius or Fahrenheit? Please use 'Celcius' or 'Fahrenheit' with first letter capitalized")
  }
}

A “function” is code that is written to perform some function. Isn’t it great when terminology is straight-forward?!

In this class, we won’t spend a lot of time creating our own functions - as with the temperature converter - But, we will be using a lot of functions so it’s important to know the basics.

Every function has the following elements:

  • Name
  • Argument(s)
  • Function Body
  • Return/Output
function_name <- function(argument_1, argument_2, ...) {
   Function body 
}

i. Function name

The function name exists so that you can easily call the function whenever you need it.

ii. Function arguments

Functions take “arguments” as input. These are the elements that you put inside the paretheses.

x <- 1:5
mean(x) #the variable x is the argument
## [1] 3

Functions can take multiple arguments, in fact, some require multiple arguments. The order of the arguments matters.

seq(5, 85, 20) # Seq() creates a sequence of numbers. In this case, from 5 to 85 by 20
## [1]  5 25 45 65 85
seq(20, 85, 5) # From 20 to 85 by 5
##  [1] 20 25 30 35 40 45 50 55 60 65 70 75 80 85

You can also specify each argument. For most functions, if you specify each argument explicitly the order no longer matters.

seq(from = 5, to = 85, by = 20)
## [1]  5 25 45 65 85
seq(by = 20, to = 85, from = 5)
## [1]  5 25 45 65 85

Some arguments require an input value and some are set by default. It’s always best to use ? and read the documentation before using a function.

iii. Function Body

The action that the function performs is found inside the curly brackets {}. Unless you want to write your own function or you want to look inside of a pre-existing function, you don’t need to worry about this right now.

iv. Function return

The purpose of a function is to produce some result. In the functions we’ve seen so far, the result is a value. These values can be stored in a variable.

x <- seq(from = 1, to = 10, by = 0.5)
mean_of_x <- mean(x)
mean_of_x
## [1] 5.5
# A more realistic (but complicated) example

control_group_scores <- rnorm(n = 100, mean = 75, sd = 10)
treatment_group_scores <- rnorm(n = 100, mean = 86, sd = 7)
group <- gl(n = 2, k = 100, length = 200, labels = c("control", "treatment"))

scores <- c(control_group_scores, treatment_group_scores)

linear_model <- lm(scores ~ group)
anova(linear_model)
## Analysis of Variance Table
## 
## Response: scores
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## group       1  5983.8  5983.8  85.133 < 2.2e-16 ***
## Residuals 198 13917.1    70.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(linear_model)
## 
## Call:
## lm(formula = scores ~ group)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.0627  -4.9536   0.3812   4.6084  21.9716 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     74.8711     0.8384  89.304   <2e-16 ***
## grouptreatment  10.9397     1.1856   9.227   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.384 on 198 degrees of freedom
## Multiple R-squared:  0.3007, Adjusted R-squared:  0.2971 
## F-statistic: 85.13 on 1 and 198 DF,  p-value: < 2.2e-16

Other functions perform actions that are essential to working in R, but won’t (necessarily) produce an object that you can use. For example, you can get your working directory by using the function getwd(). You can change your working directory by using setwd(). list.files() will produce a list of all of the files in your working directory.

Two of the most useful functions that you will use are install.packages() and library(). Packages are like “add-ons” to R. A package contains R functions, example data, and helpful documentation. If you find yourself thinking “I wonder if there is a way to do this” the answer is most likely “yes and there is a package that does it”. You can find all of the R approved packages at https://cran.r-project.org. Packages that are on this website are very easy to download and start using. Simply use the install.packages() function and put the name of the package as an argument: install.packages("package Name")

The packages you download are stored in your R library. The function library() produces a list of all the packages currently installed. To use a package you have to load it first. To load a package put the package name as an argument into the library() function: library("package Name"). If you don’t know the path to your library, you can use .libPaths()

In this class, we will learn to use two packages: “dplyr” and “ggplot2”

install.packages("dplyr") # must have quotes around the package name
library(dplyr) # quotes are optional

================================================================================

Session information:

Last update on 2021-10-11

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.0  magrittr_2.0.3 
##  [5] evaluate_0.16   stringi_1.7.8   cachem_1.0.6    rlang_1.0.5    
##  [9] cli_3.3.0       rstudioapi_0.14 jquerylib_0.1.4 bslib_0.4.0    
## [13] rmarkdown_2.16  tools_4.2.1     stringr_1.4.1   xfun_0.32      
## [17] yaml_2.3.5      fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3
## [21] knitr_1.40      sass_0.4.2

================================================================================

Copyright © 2022 Dan C. Mann. All rights reserved.