Suggestion Box
Spot an error or have suggestions for improvement on these notes? Let us know!
Indexing and Subsetting in R
In this section, we’ll explore how to access and manipulate specific parts of your data using indexing and subsetting.
These skills allow you to extract or modify elements of vectors, lists, matrices, and data frames efficiently.
1. Indexing Basics
Indexing means selecting elements from a data object using their position, names, or logical conditions.
R uses square brackets [] for indexing.
Example with a Vector
fruits <- c("apple", "banana", "cherry", "date")
fruits[1] # first element
fruits[2:4] # elements 2 through 4
fruits[-1] # all but the first element
R is 1-indexed, meaning counting starts at 1 (not 0, as in some other languages).
JavaScript Comparison
let fruits = ["apple", "banana", "cherry", "date"];
console.log(fruits[0]); // first element in JS
2. Logical Indexing
You can subset data using logical values (TRUE or FALSE).
Only elements corresponding to TRUE will be selected.
nums <- c(5, 10, 15, 20)
nums[c(TRUE, FALSE, TRUE, FALSE)] # selects 5 and 15
nums[nums > 10] # selects elements greater than 10
Logical indexing is powerful for filtering data.
3. Indexing by Name
If a vector or list has names, you can access elements by those names.
scores <- c(math = 90, english = 85, science = 92)
scores["math"]
scores[c("math", "science")]
You can combine named and position-based indexing:
scores[1]
scores["english"]
4. Subsetting Lists
Lists can contain elements of different types — numbers, strings, vectors, or even other lists.
student <- list(
name = "Alex",
age = 20,
scores = c(88, 92, 95)
)
Access elements with $ or double brackets [[]]:
student$name
student[["age"]]
student$scores[2]
JavaScript Comparison
let student = { name: "Alex", age: 20, scores: [88, 92, 95] };
console.log(student.name);
console.log(student.scores[1]);
5. Indexing Matrices
Matrices are two-dimensional, so you use row, column indexing.
m <- matrix(1:9, nrow = 3, byrow = TRUE)
m
m[1, 2] # row 1, column 2
m[ , 3] # all rows, column 3
m[2, ] # entire second row
You can also use negative indices to exclude specific rows or columns:
m[-1, ] # exclude the first row
6. Subsetting Data Frames
Data frames are the most common structure you’ll work with in R.
They behave similarly to matrices but can hold columns of different types.
df <- data.frame(
id = 1:4,
name = c("Alice", "Bob", "Carmen", "Diego"),
score = c(88, 92, 95, 90)
)
# Subset by position
df[1, ] # first row
df[, 2] # second column
df[1:2, c("id", "score")] # rows 1–2, specific columns
Access by column name using $:
df$name
df$score
Equivalent in JavaScript using an array of objects:
let df = [
{id: 1, name: "Alice", score: 88},
{id: 2, name: "Bob", score: 92}
];
console.log(df[0].score);
7. Conditional Subsetting
You can use logical conditions to filter rows.
df[df$score > 90, ] # rows where score > 90
df[df$name == "Alice", ] # rows where name is Alice
df[df$score >= 90 & df$name != "Bob", ]
You can also create a logical vector first and reuse it:
high_scores <- df$score > 90
df[high_scores, ]
8. Adding and Removing Columns or Rows
Add a new column using $:
df$passed <- df$score >= 90
df
Remove a column by assigning NULL:
df$passed <- NULL
df
Add a new row using rbind():
new_row <- data.frame(id = 5, name = "Eva", score = 93)
df <- rbind(df, new_row)
Remove rows by negative indexing:
df <- df[-1, ] # removes first row
9. Advanced Subsetting
You can combine conditions, names, and indices for precise control.
df[df$score > 90 & df$id < 4, c("name", "score")]
Or select specific columns programmatically:
columns_to_keep <- c("id", "score")
df[ , columns_to_keep]
10. Practical Example
Let’s apply everything to a small example from a reaction time experiment.
# Create a data frame
data <- data.frame(
subject_id = 1:5,
rt = c(520, 410, 615, 450, 395),
congruent = c(TRUE, TRUE, FALSE, TRUE, FALSE)
)
# Subset only congruent trials
congruent_trials <- data[data$congruent == TRUE, ]
# Subset fast trials (RT < 500)
fast_trials <- data[data$rt < 500, ]
# Subset specific columns
subset_cols <- data[, c("subject_id", "rt")]
Example output:
subject_id rt congruent
1 1 520 TRUE
2 2 410 TRUE
3 3 615 FALSE
4 4 450 TRUE
5 5 395 FALSE
11. Summary
- Use
[]to extract elements from vectors, lists, matrices, and data frames. - Use
$or[["name"]]for named list or data frame elements. - Logical subsetting filters data based on conditions.
- Negative indices remove elements.
- Combine multiple methods for flexible data manipulation.
Mastering indexing and subsetting makes your R workflow efficient and precise — essential for data wrangling, cleaning, and analysis.