IntroR: Objects

III. Objects

If you’ve worked in R you’ve probably received plenty of error messages that are super confusing. Sometimes those error messages occur because your data are stored as the wrong object type.

Let’s look at two ways to store a range of numbers. In R, you can use the function c() to concatenate values. You can enter the numbers 1 through 10 by using two methods. A colon : is used for ranges with the first number being the smallest and the last the largest. You can also enter each value.

x <- c(1:10)
y <- c(1,2,3,4,5,6,7,8,9,10)
x

##  [1]  1  2  3  4  5  6  7  8  9 10

##  [1]  1  2  3  4  5  6  7  8  9 10

On the surface the objects x and y look the same. You can check if they are both numeric objects by using the function is.numeric()

is.numeric(x)

## [1] TRUE

is.numeric(y)

## [1] TRUE

Both are numeric, however when you check if the two are exactly identical, we find that they are not!

identical(x^2, y^2)

## [1] TRUE

identical(x + 0.5, y + 0.5)

## [1] TRUE

identical(x, y)

## [1] FALSE

So what’s the deal? Well, the output of : is going to be an integer. Manually entering the numbers, however, gives you a floating point number (called a “double”)

typeof(x)

## [1] "integer"

typeof(y)

## [1] "double"

For the most part, R is actually pretty good at dealing data being in the wrong format. However, it’s still not as good as a human and it will make mistakes.

So what are the different data types?

1 dimensional data structure:
- Null
- Vectors
  - List
  - Atomic Vectors
    - Logical
    - Character
    - Numeric
      - Double
      - Integer
2 dimensional data structure
- Matrix
- Dataframe
N dimensional data structure
- Array

A. 1D structures

The one-dimensional structures are the basic building blocks which can be used to build the derived objects like data frames and matricies.

i. Vectors

This term can be a bit confusing, especially since R is used so much in statistics. The term does not have anything to do with the math term “vector”. Rather, in this context it essentially means a sequence of values. Contrast this with an “Null” object which has a length of 0.

x <- 1
length(x)

## [1] 1

is.vector(x) ## TMI: is.vector() technically checks if the object is a vector with no attributes other than names. To truly check if an object is a vector use: is.atomic(x) || is.list(x) . For our purposes now, is.vector() will work.

## [1] TRUE

x <- c(1, 2, 3, 4)
length(x)

## [1] 4

is.vector(x)

## [1] TRUE

x <- 1:1000
length(x)

## [1] 1000

is.vector(x)

## [1] TRUE

x <- 0 
length(x)

## [1] 1

is.vector(x)

## [1] TRUE

x <- NULL
length(x)

## [1] 0

is.vector(x)

## [1] FALSE

ii. Atomic vectors and lists

There are two types of vectors: lists and atomic vectors. The main difference between these two object types is that atomic vectors are sequences of data which are all the same type. Lists can contain multiple types of data.

There are four types of atomic vectors: logical, integer, double, and character. Logical are either TRUE or FALSE or NA. These are most often used in comparisons. Integer and double are both numeric, with the former containing integer data and the latter sequences of real numbers.

logical_vector <- c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE)
is.logical(logical_vector)

## [1] TRUE

character_vector <- c("true", "false", "true", "false", "true")
is.logical(character_vector)

## [1] FALSE

is.character(character_vector)

## [1] TRUE

is.logical(as.logical(character_vector))

## [1] TRUE

A list is a sequence of heterogenous data.

x <- c("one", 1.2, 1, TRUE, c(1,2,3,4,5), c("hello", "world"))
x[1]

## [1] "one"

B. 2D data structures

If you think about an excel data sheet, vectors would be one column of values. The number of rows is variable but you only have one column. Dataframes and matrices, however, are more similar to the excel datasheet in that they have columns as well. Dataframes are like lists in that they can have multiple data types (though each column can only be of one type). Matricies must have homogeneous data.

df <- data.frame(
  categorical = sample(c("a", "b", "c"), size = 300, replace = T),
  double = rnorm(300, mean=200, sd=30),
  integer = floor(rnorm(300, mean = 120, sd = 14)),
  logical = sample(c(TRUE, FALSE), size = 300, replace = T)
)

head(df)

##   categorical   double integer logical
## 1           a 257.4975     139   FALSE
## 2           a 236.6566     151    TRUE
## 3           c 216.6820      99   FALSE
## 4           b 229.8264     119   FALSE
## 5           c 164.2863      92    TRUE
## 6           b 188.1514     135    TRUE

================================================================================

Session information:

Last update on 2020-10-14

sessionInfo()

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.0  magrittr_2.0.3 
##  [5] evaluate_0.16   stringi_1.7.8   cachem_1.0.6    rlang_1.0.5    
##  [9] cli_3.3.0       rstudioapi_0.14 jquerylib_0.1.4 bslib_0.4.0    
## [13] rmarkdown_2.16  tools_4.2.1     stringr_1.4.1   xfun_0.32      
## [17] yaml_2.3.5      fastmap_1.1.0   compiler_4.2.1  htmltools_0.5.3
## [21] knitr_1.40      sass_0.4.2

================================================================================