Suggestion Box
Spot an error or have suggestions for improvement on these notes? Let us know!
Data Types and Structures in R
In this section, we will explore how R stores and organizes data.
Understanding data types and structures is essential because it determines what operations you can perform and how R interprets your variables.
All data in R belongs to some type (what kind of thing it is) and some structure (how it is arranged).
1. Data Types in R
The most common atomic data types in R are:
| Type | Example | Description | JavaScript Equivalent |
|---|---|---|---|
| Numeric | x <- 3.14 |
Real numbers with decimals | let x = 3.14; |
| Integer | x <- 5L |
Whole numbers (note the L) |
let x = 5; |
| Character | word <- "hello" |
Text strings | let word = "hello"; |
| Logical | flag <- TRUE |
TRUE/FALSE values | let flag = true; |
| NA | value <- NA |
Missing value placeholder | let value = null; |
Checking a Variable’s Type
x <- 3.14
typeof(x)
class(x)
is.numeric(x)
You can test whether something is of a specific type using functions like is.numeric(), is.character(), or is.logical().
-
Numeric: Numeric values represent real numbers (e.g., 3.5, -4, 100). They can be integers or decimal numbers.
x <- 3.14 # Numeric value typeof(x) # Double is.numeric(x) # TRUE is.integer(x) # FALSE -
Integer: Integers are whole numbers, created by appending an "L" (e.g., 5L, -2L).
int <- 5L # Integer typeof(int) # Integer class(int) # Integer is.numeric(int) # TRUE is.integer(int) # TRUE -
Character (String): Character values are text, or strings, enclosed in quotes ("Hello", 'World').
word <- "hello!" # Character typeof(word) # Character -
Logical (Boolean): Logical values are either TRUE or FALSE, often used for conditions and comparisons.
flag <- TRUE # Logical typeof(flag) # Logical flag2 <- F typeof(flag2) # Logical is.logical(flag) # TRUE
2. NA: Missing Data
Understanding NA Values in R
Missing data are an unavoidable part of real-world research.
In R, missing or undefined information is represented by a special constant: NA, which stands for Not Available.
What NA Means
NA indicates that a value exists in the dataset but is unknown or missing.
scores <- c(90, 85, NA, 88)
scores
# [1] 90 85 NA 88
Unlike 0 or "" (empty string) or NULL (absence of any input), NA does not represent a valid number or character—it represents uncertainty.
Because R cannot assume the value, operations involving NA return NA, which will propagate into any calculations with NA:
mean(scores)
# [1] NA
sum(scores)
# [1] NA
x + missing # adding 3.14 + NA
# [1] NA
Handling Missing Values
Many R functions include an argument na.rm = TRUE (remove NAs) to ignore missing values explicitly.
mean(scores, na.rm = TRUE)
# [1] 87.66667
This tells R to exclude missing data before computing.
Checking for Missingness
Use is.na() to test which elements are missing:
is.na(scores)
# [1] FALSE FALSE TRUE FALSE
Why NA Matters in Research
Missing data appear in nearly every behavioral or psychological dataset (skipped questions, dropped trials, incomplete responses).
Handling NA correctly ensures statistical integrity—-R forces explicit decisions rather than silently ignoring missing data.
Transparent handling of missingness helps maintain reproducibility and validity.
3. Data Structures:
Data structures are collections of these atomic data types and can represent more complex data.
-
Vector: A vector is a sequence of elements of the same type, making it the most basic data structure in R.
- Vectors can hold any atomic type, but they must be homogeneous (all elements of the same type).
- You can check the type of a vector with typeof() and the structure with str().
- You can create a vector by including a list of
numbersor"words"separated by commas within the concatenate function:c(). Note, words must be in quotes.
numeric_vector <- c(1.5, 2.3, 5.0) # Numeric vector character_vector <- c("apple", "banana", "cherry") # Character vector typeof(character_vector) # Will output "character" in the console window -
List: A list is a more flexible structure than a vector because it can hold elements of different types, including vectors, other lists, and even functions. Lists are heterogeneous.
my_list <- list(1.5, "apple", TRUE, c(1, 2, 3)) # Mixed elements -
Matrix: A matrix is a two-dimensional collection of elements of the same type (usually numeric). You create a matrix with
matrix()and define the number of rows and columns.my_matrix <- matrix(1:9, nrow = 3, ncol = 3) # 3x3 matrix -
Array: An array is a multi-dimensional generalization of a matrix, capable of having more than two dimensions. Elements in an array must be of the same type.
my_array <- array(1:12, dim = c(3, 2, 2)) # 3D array -
Data Frame: A data frame is a two-dimensional structure similar to a table or spreadsheet. Columns in a data frame can have different types (e.g., numeric, character, logical), so it's both homogeneous across rows and heterogeneous across columns. You can access columns using
$notation such asmy_data$name, and view data with functions likehead().my_data <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(85.5, 92.0, 88.5) )
4. Special Data Types:
R can also support a few additional data types that don't fall into the above categories.
-
Factor: A factor represents categorical data and stores unique categories (levels). It’s commonly used for grouping and statistical modeling.
colors <- factor(c("red", "green", "blue", "green", "red")) -
Function: Functions are first-class objects in R, meaning you can store them in variables, pass them as arguments, and return them from other functions. You define functions using the
function()keyword.my_function <- function(x, y) { return(x + y) } my_function(3, 5) # Calls the function and returns 8
5. Converting Between Types
R provides built-in functions for converting data types.
as.numeric("5")
as.character(123)
as.logical(0)
as.data.frame(matrix(1:6, nrow = 2))
If a conversion isn’t possible (e.g., "text" → numeric), R will return NA with a warning.
6. Practical Example
Let’s create a small dataset and explore it.
# Create sample data
subject_id <- 1:20
rt <- c(470, 360, 665, 400, 445, 270, 500, 565, 350, 445, 275, NA, 600, 290, 560, 375, 450, 480, 325, 430)
congruent <- c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
color <- c("red", "blue", "blue", "green", "red", "red", "blue", "green", "blue", "green", "red", "blue", "green", "blue", "green", "red", "blue", "blue", "green", "red")
# Combine into data frame
data <- data.frame(subject_id, rt, congruent, color)
# Inspect
head(data) # Displays the top 6 observations — great for checking that your data imported or assigned correctly.
tail(data) # Displays the bottom 6 observations — useful for confirming structure and missing values at the end.
mean(data$rt) # Without na.rm = TRUE, mean() returns NA because R doesn’t ignore missing values by default.
mean(data$rt, na.rm=TRUE) # This tells R to remove any missing values before calculating the mean. Calculates mean of all RT's = 434.47ms
summary(data) # Displays descriptive statistics for each variable (column) in the dataframe.
str(data) # Displays the structure of the dataframe — its dimensions, variable types, and a sample of values.
Example output:
> head(data)
subject_id rt congruent color
1 1 470 TRUE red
2 2 360 TRUE blue
3 3 665 FALSE blue
4 4 400 TRUE green
5 5 445 FALSE red
6 6 270 TRUE red
> str(data)
'data.frame': 20 obs. of 4 variables:
$ subject_id: int 1 2 3 4 5 6 7 8 9 10 ...
$ rt : num 470 360 665 400 445 270 500 565 350 445 ...
$ congruent : logi TRUE TRUE FALSE TRUE FALSE TRUE ...
$ color : chr "red" "blue" "blue" "green" ...
7. Summary
- R stores information using data types (numeric, character, logical, etc.) and data structures (vectors, lists, data frames, etc.).
- Vectors are one-dimensional; data frames and matrices are two-dimensional.
- Lists can hold mixed data types.
- Data frames are the foundation for data analysis in R.
- Always check structure with
str()andsummary()before analysis.
Understanding data types and structures will make your future work with data manipulation, visualization, and modeling in R much easier.
Summary of R Data Types and Structures
| Type | Description | Most Important (for our purposes) |
|---|---|---|
| Atomic Types | Numeric, Integer, Character, Logical, NA | ** |
| Vector | 1D, homogeneous data (all same type) | |
| List | 1D, heterogeneous data (can store multiple types) | |
| Matrix | 2D, homogeneous data | |
| Array | Multi-dimensional, homogeneous data | |
| Data Frame | 2D, heterogeneous data (columns can vary in type) | ** |
| Factor | Categorical data with unique levels | ** |
| Function | First-class objects for performing tasks | ** |