; PSY 1903
PSY 1903 Programming for Psychologists

Suggestion Box

Spot an error or have suggestions for improvement on these notes? Let us know!

Data Types and Structures in R

In this section, we will explore how R stores and organizes data.
Understanding data types and structures is essential because it determines what operations you can perform and how R interprets your variables.

All data in R belongs to some type (what kind of thing it is) and some structure (how it is arranged).


1. Data Types in R

The most common atomic data types in R are:

Type Example Description JavaScript Equivalent
Numeric x <- 3.14 Real numbers with decimals let x = 3.14;
Integer x <- 5L Whole numbers (note the L) let x = 5;
Character word <- "hello" Text strings let word = "hello";
Logical flag <- TRUE TRUE/FALSE values let flag = true;
NA value <- NA Missing value placeholder let value = null;

Checking a Variable’s Type

x <- 3.14
typeof(x)
class(x)
is.numeric(x)

You can test whether something is of a specific type using functions like is.numeric(), is.character(), or is.logical().

  • Numeric: Numeric values represent real numbers (e.g., 3.5, -4, 100). They can be integers or decimal numbers.

    x <- 3.14       # Numeric value
    typeof(x)       # Double
    is.numeric(x)   # TRUE
    is.integer(x)   # FALSE
    
  • Integer: Integers are whole numbers, created by appending an "L" (e.g., 5L, -2L).

    int <- 5L        # Integer
    typeof(int)      # Integer
    class(int)       # Integer
    is.numeric(int)  # TRUE
    is.integer(int)  # TRUE
    
  • Character (String): Character values are text, or strings, enclosed in quotes ("Hello", 'World').

    word <- "hello!"  # Character
    typeof(word)      # Character
    
  • Logical (Boolean): Logical values are either TRUE or FALSE, often used for conditions and comparisons.

    flag <- TRUE      # Logical
    typeof(flag)      # Logical
    flag2 <- F
    typeof(flag2)     # Logical
    is.logical(flag)  # TRUE
    

2. NA: Missing Data

Understanding NA Values in R

Missing data are an unavoidable part of real-world research.
In R, missing or undefined information is represented by a special constant: NA, which stands for Not Available.

What NA Means

NA indicates that a value exists in the dataset but is unknown or missing.

scores <- c(90, 85, NA, 88)
scores
# [1] 90 85 NA 88

Unlike 0 or "" (empty string) or NULL (absence of any input), NA does not represent a valid number or character—it represents uncertainty. Because R cannot assume the value, operations involving NA return NA, which will propagate into any calculations with NA:

mean(scores)
# [1] NA
sum(scores)
# [1] NA
x + missing  # adding 3.14 + NA
# [1] NA

Handling Missing Values

Many R functions include an argument na.rm = TRUE (remove NAs) to ignore missing values explicitly.

mean(scores, na.rm = TRUE)
# [1] 87.66667

This tells R to exclude missing data before computing.

Checking for Missingness

Use is.na() to test which elements are missing:

is.na(scores)
# [1] FALSE FALSE TRUE FALSE

Why NA Matters in Research

Missing data appear in nearly every behavioral or psychological dataset (skipped questions, dropped trials, incomplete responses). Handling NA correctly ensures statistical integrity—-R forces explicit decisions rather than silently ignoring missing data. Transparent handling of missingness helps maintain reproducibility and validity.


3. Data Structures:

Data structures are collections of these atomic data types and can represent more complex data.

  • Vector: A vector is a sequence of elements of the same type, making it the most basic data structure in R.

    • Vectors can hold any atomic type, but they must be homogeneous (all elements of the same type).
    • You can check the type of a vector with typeof() and the structure with str().
    • You can create a vector by including a list of numbers or "words" separated by commas within the concatenate function: c(). Note, words must be in quotes.
    numeric_vector <- c(1.5, 2.3, 5.0)  # Numeric vector
    character_vector <- c("apple", "banana", "cherry")  # Character vector
    typeof(character_vector) # Will output "character" in the console window
    
  • List: A list is a more flexible structure than a vector because it can hold elements of different types, including vectors, other lists, and even functions. Lists are heterogeneous.

    my_list <- list(1.5, "apple", TRUE, c(1, 2, 3))  # Mixed elements
    
  • Matrix: A matrix is a two-dimensional collection of elements of the same type (usually numeric). You create a matrix with matrix() and define the number of rows and columns.

    my_matrix <- matrix(1:9, nrow = 3, ncol = 3)  # 3x3 matrix
    
  • Array: An array is a multi-dimensional generalization of a matrix, capable of having more than two dimensions. Elements in an array must be of the same type.

    my_array <- array(1:12, dim = c(3, 2, 2))  # 3D array
    
  • Data Frame: A data frame is a two-dimensional structure similar to a table or spreadsheet. Columns in a data frame can have different types (e.g., numeric, character, logical), so it's both homogeneous across rows and heterogeneous across columns. You can access columns using $ notation such as my_data$name, and view data with functions like head().

    my_data <- data.frame(
      id = 1:3,
      name = c("Alice", "Bob", "Charlie"),
      score = c(85.5, 92.0, 88.5)
    )
    

4. Special Data Types:

R can also support a few additional data types that don't fall into the above categories.

  • Factor: A factor represents categorical data and stores unique categories (levels). It’s commonly used for grouping and statistical modeling.

    colors <- factor(c("red", "green", "blue", "green", "red"))
    
  • Function: Functions are first-class objects in R, meaning you can store them in variables, pass them as arguments, and return them from other functions. You define functions using the function() keyword.

    my_function <- function(x, y) {
      return(x + y)
    }
    my_function(3, 5)  # Calls the function and returns 8
    

5. Converting Between Types

R provides built-in functions for converting data types.

as.numeric("5")
as.character(123)
as.logical(0)
as.data.frame(matrix(1:6, nrow = 2))

If a conversion isn’t possible (e.g., "text" → numeric), R will return NA with a warning.


6. Practical Example

Let’s create a small dataset and explore it.

# Create sample data
subject_id <- 1:20
rt <- c(470, 360, 665, 400, 445, 270, 500, 565, 350, 445, 275, NA, 600, 290, 560, 375, 450, 480, 325, 430)
congruent <- c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
color <- c("red", "blue", "blue", "green", "red", "red", "blue", "green", "blue", "green", "red", "blue", "green", "blue", "green", "red", "blue", "blue", "green", "red")

# Combine into data frame
data <- data.frame(subject_id, rt, congruent, color)

# Inspect
head(data)    # Displays the top 6 observations — great for checking that your data imported or assigned correctly.
tail(data)    # Displays the bottom 6 observations — useful for confirming structure and missing values at the end.
mean(data$rt) # Without na.rm = TRUE, mean() returns NA because R doesn’t ignore missing values by default.
mean(data$rt, na.rm=TRUE) # This tells R to remove any missing values before calculating the mean. Calculates mean of all RT's = 434.47ms
summary(data) # Displays descriptive statistics for each variable (column) in the dataframe.
str(data)     # Displays the structure of the dataframe — its dimensions, variable types, and a sample of values.

Example output:

> head(data)
  subject_id  rt congruent color
1          1 470      TRUE   red
2          2 360      TRUE  blue
3          3 665     FALSE  blue
4          4 400      TRUE green
5          5 445     FALSE   red
6          6 270      TRUE   red

> str(data)
'data.frame':	20 obs. of  4 variables:
 $ subject_id: int  1 2 3 4 5 6 7 8 9 10 ...
 $ rt        : num  470 360 665 400 445 270 500 565 350 445 ...
 $ congruent : logi  TRUE TRUE FALSE TRUE FALSE TRUE ...
 $ color     : chr  "red" "blue" "blue" "green" ...

7. Summary

  • R stores information using data types (numeric, character, logical, etc.) and data structures (vectors, lists, data frames, etc.).
  • Vectors are one-dimensional; data frames and matrices are two-dimensional.
  • Lists can hold mixed data types.
  • Data frames are the foundation for data analysis in R.
  • Always check structure with str() and summary() before analysis.

Understanding data types and structures will make your future work with data manipulation, visualization, and modeling in R much easier.

Summary of R Data Types and Structures

Type Description Most Important (for our purposes)
Atomic Types Numeric, Integer, Character, Logical, NA **
Vector 1D, homogeneous data (all same type)
List 1D, heterogeneous data (can store multiple types)
Matrix 2D, homogeneous data
Array Multi-dimensional, homogeneous data
Data Frame 2D, heterogeneous data (columns can vary in type) **
Factor Categorical data with unique levels **
Function First-class objects for performing tasks **