; PSY 1903
PSY 1903 Programming for Psychologists

Suggestion Box

Spot an error or have suggestions for improvement on these notes? Let us know!

Week 10: Debugging and Using AI Tools


0. Setting Up Your Debugging Project

To follow along with this week’s lesson and examples, you’ll create a clean RStudio Project specifically for debugging practice.
This ensures your environment is reproducible and separate from other work.


Step 1: Create the Directory Structure

Adapt the path in the first line to match your computer’s psy1903 directory.
Copy and paste this code into your R console (not a script):

setwd("/Users/yourusername/Desktop/psy1903/web/") # Update with your own path

dir.create("debug_example")
dir.create("debug_example/data")
dir.create("debug_example/data/raw")
dir.create("debug_example/data/cleaned")
dir.create("debug_example/scripts")
dir.create("debug_example/output")
dir.create("debug_example/output/plots")
dir.create("debug_example/output/tables")
dir.create("debug_example/reports")

Step 2: Create a New R Project

  1. Go to: File → New Project → Existing Directory
  2. Browse to:
    /Users/yourusername/Desktop/psy1903/web/debug_example
  3. Check “Open in new session”
  4. Click Create Project

You should now see a file called debug_example.Rproj inside that folder.


Step 3: Confirm Your Working Directory

In the Console, run:

getwd()

It should return a path ending in /psy1903/web/debug_example.

RStudio opens a new session for each project — meaning each has its own Console, Environment, and Files pane.
This isolation is crucial for debugging because it prevents hidden variables from interfering with your work.


1. Creating a Quarto Debugging Report

You’ll now create a Quarto document to work through debugging examples reproducibly.

  1. Make sure you’re in the debug_example project (look for debug_example.Rproj in the top-right of RStudio).
  2. In the Files pane, open the reports/ folder.
  3. Go to File → New File → Quarto Document.
  4. Set:
    • Title: Week 10 Debugging Exercise
    • Author: Your Name
    • Output Format: HTML
    • Uncheck “Use visual markdown editor.”
  5. Click Create Empty Document.
  6. Save it as:
    reports/debug_example.qmd

Edit the YAML Header

Replace the top section of your file with this:

---
title: "Week 10 Debugging and Refactoring"
author: "Your Name"
format: html
execute:
  echo: true
  warning: true
  message: true
---
  • echo: true — shows code and results together (great for learning and sharing).
  • warning: true — displays any warnings (useful for transparency).
  • message: true — prints messages from R (like package loads).

2. Creating a Debug Dataset

Use this code to simulate a dataset that mimics a small reaction-time experiment (similar to an IAT).

set.seed(1903)

iat_data <- data.frame(
  subject_id = 1:40,
  rt = round(rnorm(40, mean = 520, sd = 120)),
  congruent = sample(c(TRUE, FALSE), 40, replace = TRUE),
  condition = sample(c("control", "incongruent"), 40, replace = TRUE)
)

# Randomly assign two NA values in the rt column
iat_data$rt[sample(1:nrow(iat_data), 2)] <- NA

# Save for later use
saveRDS(iat_data, file = "data/raw/iat_data.rds")
Column Description
subject_id Unique participant ID (1–40).
rt Mean reaction time in milliseconds. Smaller = faster. Some values are NA.
congruent Logical TRUE/FALSE — whether trials were matched or mismatched.
condition Experimental condition (“control” or “incongruent”).

3. Why Debugging Matters

Debugging means systematically finding and fixing errors in your code. It’s not a sign of failure; it’s how you understand what your code is really doing.

Even professional coders spend much of their time debugging, testing, and refining code. In fact, debugging is one of the most effective ways to learn how R thinks. Every error message tells you something about how your code is being interpreted, whether it’s about data types, variable scope, or syntax structure.

When something breaks, you’re forced to slow down, trace your logic, and confirm that each step does what you think it does. This process helps you spot not only the immediate bug but also larger issues with code organization or reproducibility.

Good debugging habits lead to cleaner, more reliable scripts and make collaboration easier because well-tested code is easier for others (and your future self) to understand and trust.


Example 1: Missing Object

summary(iat_data_summary)

Error: object 'iat_data_summary' not found

Fix:

iat_data_summary <- summary(iat_data)

R is telling us that the object doesn’t exist yet.
In Quarto, this happens often because rendering starts from a clean environment.


Example 2: NA Propagation

mean(iat_data$rt)

Output:

[1] NA

Fix:

mean(iat_data$rt, na.rm = TRUE)

Missing values (NA) make R stop and return NA.
The argument na.rm = TRUE tells R to remove missing cases before computing.

Dealing with NA Values:

The best practice is to use na.rm = TRUE for any functions where that is possible. While na.omit() is a quick way to remove missing values, it should be used sparingly because it drops entire rows from your dataset if any column in that row contains an NA. This can unintentionally reduce your sample size and bias your analysis, especially if missingness isn’t random. Instead, consider more targeted approaches: use na.rm = TRUE inside functions like mean() or sum() to ignore NAs only where necessary, use is.na() to selectively handle or replace missing values, or use data-cleaning functions such as dplyr::mutate() with ifelse() or tidyr::replace_na() to impute or flag missing data. These methods preserve more of your dataset and keep your results transparent and reproducible. This is more of a statistics/analysis decision, and many labs will have different preferences, so we will not go into more depth for the purposes of PSY 1903.


Example 3: Wrong Data Type

mean(iat_data$condition)

Error:
argument is not numeric or logical: returning NA

Fix:

str(iat_data)

The condition column is a character or factor — not numeric — so mean() doesn’t make sense.
Debugging often starts with checking structure using str() or summary().


4. Debugging a Real Bug with traceback() and browser()

1. Introduce the Bug

This function tries to compute mean RT by condition but mistakenly refers to rt (a bare name) instead of data$rt.

calculate_group_means <- function(data) {
  # BUG: using rt instead of data$rt
  result <- tapply(rt, data$condition, mean, na.rm = TRUE)
  return(result)
}

calculate_group_means(iat_data)
# Error in tapply(rt, data$condition, mean, na.rm = TRUE) : 
#   arguments must have same length

2. Read the Error, Then Call traceback()

traceback() shows the call stack of the last error. Read it from bottom to top to see where your code entered the failing call.

traceback()
# 3: stop("arguments must have same length")
# 2: tapply(rt, data$condition, mean, na.rm = TRUE) at #3
# 1: calculate_group_means(iat_data)

How to use this:

  • Line 1 shows your function call.
  • Line 2 shows the failing line inside your function: tapply(rt, data$condition, ...).
  • The message “arguments must have same length” hints that the two vectors you passed to tapply() don’t match… or one of them isn’t even a vector of data.

3. Drop Into the Function With browser()

browser() pauses execution inside your function so you can inspect variables at runtime.

calculate_group_means <- function(data) {
  browser()  # Pause here to explore
  result <- tapply(rt, data$condition, mean, na.rm = TRUE)
  return(result)
}

calculate_group_means(iat_data)

When R hits browser(), the console switches to a “Browse” prompt:

Browse[1]> 

Now you can inspect:

Browse[1]> ls()
# [1] "data"

Browse[1]> data[1:5, "rt"]
# [1] 573 457 424 436 540

So data$rt is fine. But what is rt (the bare name) that we passed to tapply()?

Browse[1]> rt
# function (n, df, ncp) { ... }
# <environment: namespace:stats>

Aha! rt is the built-in Student’s t random-number generator from the stats package (it’s a function, not your column). That explains the error: we passed a function to tapply() instead of a numeric vector, so lengths don’t match.

What to look for in browser():

  • ls() – what objects exist in the current function frame?
  • Print suspicious names (like rt) to see what they actually are.
  • Inspect slices of your data, e.g., data[1:5, ] or str(data).

Useful browser() commands:

  • n – next line
  • c – continue until next breakpoint/return
  • Q – quit the browser immediately

Type Q to exit the browser.

4. Fix the Bug

Use the correct column reference inside tapply().

calculate_group_means <- function(data) {
  result <- tapply(data$rt, data$condition, mean, na.rm = TRUE)
  return(result)
}

calculate_group_means(iat_data)
#    control incongruent 
#   525.8750    501.7143 

What We Learned

  • traceback() tells you where the error happened. Use it right after an error to get the call stack and pinpoint the failing line.
  • browser() lets you pause inside the function and inspect the environment. Use it to check object names, types, and values at the moment things go wrong.
  • Name collisions are real: rt is a base R function. Inside your functions, always use explicit column references (e.g., data$rt) to avoid accidentally calling a different object with the same name.

Tip: If you suspect a name collision, try get("rt", mode = "function") or exists("rt") and typeof(rt) while in the browser to confirm what rt really is.


5. Debugging Workflow in Quarto

  1. Reproduce the error.
  2. Read the error message carefully.
  3. Simplify the code and test smaller pieces.
  4. Inspect your data (str(), head(), summary()).
  5. Print intermediate results with print() or paste().
  6. Use traceback() or browser() to step through function calls.
  7. Render from a clean session to confirm the fix.

In Quarto, every render starts fresh — so if it runs cleanly once, it’ll run cleanly anywhere.


6. Comparing Loop vs Vectorized Code Speed

We've told you that vectorized versions are more efficient, but let's actually demonstrate this. We're going to use system.time() to compare a slow for-loop version with a vectorized version.

Loop version

system.time({
  for (i in 1:nrow(iat_data)) {
    if (is.na(iat_data[i, "rt"])) {
      iat_data[i, "rt_category"] <- "Unknown"
    } else if (iat_data[i, "rt"] < 500) {
      iat_data[i, "rt_category"] <- "Fast"
    } else {
      iat_data[i, "rt_category"] <- "Slow"
    }
  }
})

Vectorized version

system.time({
  iat_data$rt_category <- ifelse(
    is.na(iat_data$rt), "Unknown",
    ifelse(iat_data$rt < 500, "Fast", "Slow")
  )
})

Compare the two results.
system.time() reports user, system, and elapsed times — the last one is how long you actually waited.

For larger datasets, the vectorized approach can be hundreds of times faster.


7. Debugging with AI and Refactoring

AI tools like ChatGPT can be powerful partners in debugging — they can help you interpret cryptic error messages, identify logical or scoping mistakes, and even suggest cleaner or faster ways to write your code.
However, they’re not mind-readers. AI tools work best when you provide clear context: what you were trying to do, what you expected to happen, what actually happened, and the relevant error message.


Example 1: Debugging in Context

Suppose you try to compute z-scores within your dataset:

iat_data$z_rt <- (iat_data$rt - mean(rt, na.rm = TRUE)) / sd(rt, na.rm = TRUE)

When you run or render this in Quarto, you see:

Error in mean.default(rt, na.rm = TRUE) : 
  (converted from warning) argument is not numeric or logical: returning NA

Fix:

iat_data$z_rt <- (iat_data$rt - mean(iat_data$rt, na.rm = TRUE)) /
                 sd(iat_data$rt, na.rm = TRUE)

If you asked AI something like:

“Why is this code producing an error when I render my Quarto file?
I’m computing a z-score column in a data frame called iat_data.”

It could explain:

  • The variable rt isn’t defined globally — it only exists as a column inside your data frame.
  • Quarto runs your document in a fresh R session, so only objects explicitly created within the file are available.
  • You need to use iat_data$rt to reference the column correctly.

Example 2: Using AI to Compare Loop vs. Vectorized Solutions

Suppose you wrote this loop to standardize reaction times:

for (i in 1:nrow(iat_data)) {
  iat_data$z_rt[i] <- (iat_data$rt[i] - mean(iat_data$rt, na.rm = TRUE)) / 
                      sd(iat_data$rt, na.rm = TRUE)
}

It works — but it’s repetitive and inefficient.
If you asked AI:

“Can you help me refactor this loop to use a vectorized approach in R?”

It might suggest replacing the entire loop with one line:

iat_data$z_rt <- scale(iat_data$rt)

This uses R’s built-in scale() function, which automatically computes z-scores efficiently in vectorized form.


Example 3: Fixing a Logical Error

Let’s say you tried to flag unusually fast responses:

iat_data$outlier <- ifelse(iat_data$rt < mean(iat_data$rt) - 2 * sd(iat_data$rt), TRUE, FALSE)

But when you knit your report, you get:

Warning: In mean.default(iat_data$rt) : argument is not numeric or logical: returning NA

After checking, you realize one participant’s rt column accidentally contains a string like "fast" due to a data-entry issue.

If you asked AI:

“Why am I getting this warning about mean.default and NA? My rt column should be numeric.”

It might recommend:

  • Checking the structure with str(iat_data) or summary(iat_data).
  • Converting the column type with:
    iat_data$rt <- as.numeric(iat_data$rt)
    
  • Using na.rm = TRUE to skip over non-numeric entries after conversion.

Example 4: When AI Can Help Refactor Code

Suppose you’ve written this nested function:

flag_outliers <- function(data) {
  data$outlier <- ifelse(is.na(data$rt), NA,
                    ifelse(data$rt < mean(data$rt, na.rm = TRUE) - 2 * sd(data$rt, na.rm = TRUE),
                           TRUE, FALSE))
  return(data)
}

You could ask AI:

“Can you simplify or vectorize this function to make it more efficient or readable?”

AI might respond with something like:

flag_outliers <- function(data, cutoff = 2) {
  data$outlier <- with(data, rt < mean(rt, na.rm = TRUE) - cutoff * sd(rt, na.rm = TRUE))
  return(data)
}

This uses with() for cleaner syntax and adds a flexible cutoff argument.
It’s still readable but more general and efficient — a key refactoring goal.


Best Practices When Debugging with AI

Good Practice Why It Helps
Provide context Share your full error message, relevant code, and what you expected to happen.
Copy the exact console output AI can only interpret what you show it — omitting the error message makes it guess.
Test suggestions piece by piece Don’t paste entire blocks of AI-generated code at once. Run small parts to verify.
Ask for explanation, not just the answer Understanding the reasoning helps you learn debugging patterns you can apply later.
Reproduce the problem in a Quarto chunk Rendering ensures your code runs cleanly from start to finish — a true reproducibility test.

Documenting AI Refactors

When you use AI to debug or improve your code, always keep a record of what changed. The easiest way is to save both the before and after versions of your code in the same Quarto file, each in its own chunk, with a short comment describing what the AI suggested and how you verified it.

For example:

## Before (loop version)
for (i in 1:nrow(iat_data)) {
  iat_data$z_rt[i] <- (iat_data$rt[i] - mean(iat_data$rt, na.rm = TRUE)) /
                      sd(iat_data$rt, na.rm = TRUE)
}

## After (AI-suggested vectorized version)
iat_data$z_rt <- as.numeric(scale(iat_data$rt))

## AI Debugging Note:
## Asked ChatGPT to "refactor this loop to a vectorized approach."
## Verified results matched the original calculation.

This approach builds a transparent trail of revisions and decisions, mirroring good open-science practices. It also helps you (and anyone reviewing your work) see how your code evolved and what was learned through debugging.


In short, AI tools are most powerful when you collaborate with them like a pair programmer:
show your work, explain your intent, and ask them to clarify or refine.
Used this way, they don’t just fix bugs — they teach you how to think like a debugger.


For additional context and practice, review these related note sets:


9. Summary

  • Debugging is a process of discovery — not just fixing mistakes.
  • Always read error messages carefully and test in isolation.
  • Use traceback() and browser() to explore inside functions.
  • Compare performance with system.time().
  • Render Quarto documents from a clean session to ensure reproducibility.
  • Use AI tools to explain and refactor, but always confirm their logic.

Readable, reproducible, and debuggable code isn’t just good practice — it’s good science.