Suggestion Box
Spot an error or have suggestions for improvement on these notes? Let us know!
Week 11 · Importing & Inspecting Data
0 · Overview
This week we are moving from isolated R skills to a full data-analysis workflow.
Our goal is to import, inspect, and understand real participant data as a foundation for cleaning and summarizing it later.
You will learn how to:
- Navigate a well-structured project.
- Recognize key file types and where they live.
- Use base R functions to load and inspect participant data.
- Understand how questionnaire data are stored as JSON objects.
1 · Project Organization & Workflow Plan
Before importing any files, make sure your project is organized:
psy1903/
└── web/
└── npt_project/
├── npt_project.RProj
├── data/
│ ├── raw/
│ └── cleaned/
├── scripts/
│ ├── score_questionnaire.R
│ └── process_participant.R
└── reports/
└── npt_import.qmd
This structure supports a reproducible workflow:
- raw/ holds the untouched data
- cleaned/ will hold the processed data
- scripts/ stores reusable functions; we have previously done everything in our Quarto report, but now we will see how we can modularize some aspects
- reports/ contains the Quarto report where you narrate the analysis
Start by creating the directory structure. Because this only needs to be done onece per preojct, you can copy this directly into your R console pane:
# Create full project folder structure under psy1903/web/
setwd("~/Desktop/psy1903") # Update to your path
dir.create("web/npt_project/data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("web/npt_project/data/cleaned", recursive = TRUE, showWarnings = FALSE)
dir.create("web/npt_project/scripts", recursive = TRUE, showWarnings = FALSE)
dir.create("web/npt_project/reports", recursive = TRUE, showWarnings = FALSE)
# Create placeholder R Script and Quarto Report files
file.create("web/npt_project/scripts/score_questionnaire.R")
file.create("web/npt_project/scripts/process_participant.R")
file.create("web/npt_project/reports/npt_import.qmd")
# Create the R Project file
file.create("web/npt_project/npt_project.RProj")
Update: To avoid an empty RProject error upon opening, copy and paste the following code into your console and run it once:
project_text <- "Version: 1.0\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 4\nEncoding: UTF-8\n\nRnwWeave: Sweave\nLaTeX: pdfLaTeX"
writeLines(project_text, "web/npt_project/npt_project.RProj")
Download the raw data and add it to your npt_project/data directory: Download here.
2 · Key R Functions You’ll Encounter in This Lesson
Before we start writing new code, it helps to get familiar with some new R functions we’ll be using throughout this lesson. Each one performs a small but essential role in our workflow, from importing data to combining results at the end.
You don’t need to memorize them all, but understanding what each function does and why we need it will make the rest of this week’s videos much clearer.
The table below summarizes the key tools we’ll rely on. We’ll see each of them in action soon, but this overview provides a reference point so you can recognize their purpose when they appear in the code.
| Function | Purpose | Typical Use Here |
|---|---|---|
read.csv() |
Imports comma-separated files | Load each participant’s .csv |
head() / str() |
Explore data structure | Inspect the file after import |
unique() |
Shows all distinct values in a column | Check what trial types or blocks exist |
fromJSON() |
Converts a JSON string to an R object | Decode questionnaire responses |
unlist() |
Flattens a list into a vector | Simplify parsed JSON values |
basename() |
Extracts the file name from a full path | Identify which participant a file belongs to |
sub() |
Replaces part of a string using a pattern | Remove the .csv extension or rename elements |
basename() + sub() |
Used together to extract/edit file names | Create participant IDs |
source() |
Loads and runs another .R script | Use custom functions inside Quarto |
list.files() |
Lists all files in a folder | Find all participant .csv files |
lapply() |
Applies a function to each element in a list | Process all participant files automatically |
do.call() |
Calls a function on a list of objects | Combine all participant summaries into one data frame |
Mental Models
read.csv()→ like opening a spreadsheet into R.head()→ peek at the first few rows, like scrolling to the top of a sheet.str()→ see the blueprint of the data (column types and structure).unique()→ ask R “what distinct categories exist here?”fromJSON()→ unpack a neatly labeled box of responses.unlist()→ flatten a nested box into a single row of values.basename()→ strip away folders to reveal the file name.sub()→ edit or clean text inside strings.source()→ plug in another script so its functions become available.list.files()→ ask R, “what files are in this directory?”lapply()→ do the same operation for each item in a list (like a batch process).do.call()→ combine all those results together into one complete dataset.
3 · Importing and Inspecting a Single Participant File
Before we can analyze data in R, we first have to bring it into our R environment. This process is called importing.
When you collect data from jsPsych or another experimental platform, it’s usually stored in an external file format such as a .csv (comma-separated values) file. This file lives on your computer, but R doesn’t automatically know about it. By importing, we’re telling R:
“Read this file from my project folder and store it as a data frame so I can work with it.” Once imported, the dataset exists as an R object that we can explore, clean, and analyze just like any other R variable.
Why This Step Matters
Importing isn’t just about getting data into R — it’s also about checking the integrity and structure of what you’ve loaded. Even small formatting differences (extra spaces, mismatched column names, missing values) can cause later code to break or produce incorrect results. The inspection step helps us confirm that:
- the file was read correctly,
- columns have the expected names and data types, and
- the experiment produced the structure we expect (one questionnaire row, practice block, experiment block, etc.).
Step-by-Step
- Import one file from
data/raw/usingread.csv().- This reads a single participant’s data file into R as a data frame.
- Store it in a variable with a meaningful name (e.g.,
participant_datarather thandfortmp).
- Inspect the structure of the imported data.
head(participant_data)shows the first few rows so you can preview what the data look like.str(participant_data)displays how R interpreted each column (numeric, character, logical).unique(participant_data$trialType)lists all unique values in the trialType column, showing what kinds of trials exist.
- Confirm that the dataset contains:
- one
trialType == "questionnaire"row, - blocks labeled
"practice"and"experiment", - and variables such as
rt,response,trial_type, andcorrect.
- one
If something looks off at this stage (for example, all RTs are character strings instead of numbers), you can fix the import settings before moving on. Catching problems early is part of good data hygiene and can save hours of debugging later.
How to Import the File in Practice
There are two main ways to import a dataset into RStudio:
-
Using the RStudio File Browser
- Go to File → Import Dataset → From Text (base).
- Browse to your
data/raw/folder and select the.csvfile. - RStudio will preview the data and generate code automatically (usually a
read.csv()command). - You can copy that code into your script for later use.
This is a good way to check that R can read your file, but it’s not ideal for reproducible work since it relies on clicking through menus each time.
-
Using
read.csv()directly in your script- Write a line of code like:
participant_data <- read.csv("data/raw/participant1.csv") - This reads the file directly from your project folder and stores it as a data frame called
participant_data. - You can then run the same code again at any time (or share it with someone else) and get the same result.
Writing the import line yourself is the preferred approach because it makes your workflow transparent and reproducible.
- Write a line of code like:
Once you’re comfortable importing and inspecting a single participant file, the same logic will extend to multiple files.
Later in the course, we’ll use R functions like list.files() and lapply() to apply these same steps automatically across all participants.
For now, it’s important to understand how the process works for one dataset before scaling up to many.
Key Takeaway Importing and inspecting are the foundation of every data analysis workflow. Before we can filter, summarize, or visualize data, we have to ensure we’re starting from a clean, correctly structured data frame. Think of this as checking your ingredients before cooking. It may seem simple or unimportant, but it determines the quality of everything that follows.
4 · Understanding the JSON Questionnaire
You may notice that the questionnaire row contains a column called response with text that looks like this:
{"item1":4,"item2":3,"item3":1,…}
This is a JSON object — a compact format for storing responses.
We’ll learn how to decode and score it in the next section when we work on questionnaire scoring.
5 · Pulling It All Together
At this point, you have:
- Set up a clear project structure with separate folders for data, scripts, and reports.
- Learned several key R functions that will support your workflow.
- Learned how to import a single participant’s file into R and confirm that it loaded correctly.
- Explored the structure of the dataset using tools like
head(),str(), andunique(). - Verified that the experiment produced the expected blocks and variables.
- Noticed that the questionnaire responses are stored in a special format we’ll return to soon.
These steps may feel basic, but they form the foundation for every later stage of data analysis. Before we can write functions, filter data, or calculate scores, we must be certain that we understand the structure and meaning of what we’re working with.
Next, we’ll learn how to take this imported file and start turning the raw data into analyzable measures. We’ll focus on decoding and scoring the questionnaire responses, filtering reaction times, and calculating participant-level summaries that can later be combined across the study to test our hypotheses.