Suggestion Box

Spot an error or have suggestions for improvement on these notes? Let us know!

Week 11 · Importing & Inspecting Data

0 · Overview

This week we are moving from isolated R skills to a full data-analysis workflow.
Our goal is to import, inspect, and understand real participant data as a foundation for cleaning and summarizing it later.

You will learn how to:

Navigate a well-structured project.
Recognize key file types and where they live.
Use base R functions to load and inspect participant data.
Understand how questionnaire data are stored as JSON objects.

1 · Project Organization & Workflow Plan

Before importing any files, make sure your project is organized:

psy1903/
└── web/
    └── npt_project/
        ├── npt_project.RProj
        ├── data/
        │   ├── raw/
        │   └── cleaned/
        ├── scripts/
        │   ├── score_questionnaire.R
        │   └── process_participant.R
        └── reports/
            └── npt_import.qmd

This structure supports a reproducible workflow:

raw/ holds the untouched data
cleaned/ will hold the processed data
scripts/ stores reusable functions; we have previously done everything in our Quarto report, but now we will see how we can modularize some aspects
reports/ contains the Quarto report where you narrate the analysis

Start by creating the directory structure. Because this only needs to be done onece per preojct, you can copy this directly into your R console pane:

# Create full project folder structure under psy1903/web/
setwd("~/Desktop/psy1903") # Update to your path
dir.create("psy1903/web/npt_project/data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/data/cleaned", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/scripts", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/reports", recursive = TRUE, showWarnings = FALSE)

# Create placeholder R Script and Quarto Report files
file.create("psy1903/web/npt_project/scripts/score_questionnaire.R")
file.create("psy1903/web/npt_project/scripts/process_participant.R")
file.create("psy1903/web/npt_project/reports/npt_import.qmd")

# Create the R Project file
file.create("psy1903/web/npt_project/npt_project.RProj")

2 · Key R Functions You’ll Encounter in This Lesson

Before we start writing new code, it helps to get familiar with some new R functions we’ll be using throughout this lesson. Each one performs a small but essential role in our workflow, from importing data to combining results at the end.

You don’t need to memorize them all, but understanding what each function does and why we need it will make the rest of this week’s videos much clearer.

The table below summarizes the key tools we’ll rely on. We’ll see each of them in action soon, but this overview provides a reference point so you can recognize their purpose when they appear in the code.

Function	Purpose	Typical Use Here
`read.csv()`	Imports comma-separated files	Load each participant’s .csv
`head()` / `str()`	Explore data structure	Inspect the file after import
`unique()`	Shows all distinct values in a column	Check what trial types or blocks exist
`fromJSON()`	Converts a JSON string to an R object	Decode questionnaire responses
`unlist()`	Flattens a list into a vector	Simplify parsed JSON values
`basename()`	Extracts the file name from a full path	Identify which participant a file belongs to
`sub()`	Replaces part of a string using a pattern	Remove the `.csv` extension or rename elements
`basename()` + `sub()`	Used together to extract/edit file names	Create participant IDs
`source()`	Loads and runs another .R script	Use custom functions inside Quarto
`list.files()`	Lists all files in a folder	Find all participant .csv files
`lapply()`	Applies a function to each element in a list	Process all participant files automatically
`do.call()`	Calls a function on a list of objects	Combine all participant summaries into one data frame

Mental Models

read.csv() → like opening a spreadsheet into R.
head() → peek at the first few rows, like scrolling to the top of a sheet.
str() → see the blueprint of the data (column types and structure).
unique() → ask R “what distinct categories exist here?”
fromJSON() → unpack a neatly labeled box of responses.
unlist() → flatten a nested box into a single row of values.
basename() → strip away folders to reveal the file name.
sub() → edit or clean text inside strings.
source() → plug in another script so its functions become available.
list.files() → ask R, “what files are in this directory?”
lapply() → do the same operation for each item in a list (like a batch process).
do.call() → combine all those results together into one complete dataset.

3 · Importing and Inspecting a Single Participant File

Before we can analyze data in R, we first have to bring it into our R environment. This process is called importing.

When you collect data from jsPsych or another experimental platform, it’s usually stored in an external file format such as a .csv (comma-separated values) file. This file lives on your computer, but R doesn’t automatically know about it. By importing, we’re telling R:

“Read this file from my project folder and store it as a data frame so I can work with it.” Once imported, the dataset exists as an R object that we can explore, clean, and analyze just like any other R variable.

Why This Step Matters

Importing isn’t just about getting data into R — it’s also about checking the integrity and structure of what you’ve loaded. Even small formatting differences (extra spaces, mismatched column names, missing values) can cause later code to break or produce incorrect results. The inspection step helps us confirm that:

the file was read correctly,
columns have the expected names and data types, and
the experiment produced the structure we expect (one questionnaire row, practice block, experiment block, etc.).

Step-by-Step

Import one file from data/raw/ using read.csv().
- This reads a single participant’s data file into R as a data frame.
- Store it in a variable with a meaningful name (e.g., participant_data rather than df or tmp).
Inspect the structure of the imported data.
- head(participant_data) shows the first few rows so you can preview what the data look like.
- str(participant_data) displays how R interpreted each column (numeric, character, logical).
- unique(participant_data$trialType) lists all unique values in the trialType column, showing what kinds of trials exist.
Confirm that the dataset contains:
- one trialType == "questionnaire" row,
- blocks labeled "practice" and "experiment",
- and variables such as rt, response, trial_type, and correct.

If something looks off at this stage (for example, all RTs are character strings instead of numbers), you can fix the import settings before moving on. Catching problems early is part of good data hygiene and can save hours of debugging later.

Key Takeaway Importing and inspecting are the foundation of every data analysis workflow. Before we can filter, summarize, or visualize data, we have to ensure we’re starting from a clean, correctly structured data frame. Think of this as checking your ingredients before cooking. It may seem simple or unimportant, but it determines the quality of everything that follows.

4 · Understanding the JSON Questionnaire

When you ran your experiment, participants filled out a short questionnaire at the end. The responses to that questionnaire are saved in your data file, usually in a single row where trialType == "questionnaire".

If you look at that row in your imported data, you will see that the response column contains a long text string that looks like this:

{"item1":4,"item2":3,"item3":1,"item4":2,"item5":5,"item6":4,"item7":2,"item8":1,"item9":4,"item10":5}

This format is called JSON, which stands for JavaScript Object Notation. JSON is a common way to store structured data so that different programs and programming languages can easily read and exchange it.

Inside the curly braces { }, information is organized as key–value pairs. Each key (for example "item1") represents a variable name or questionnaire item, and each value (for example 4) represents the participant’s response. Together, these pairs describe one participant’s set of answers.

For example:

Key	Value	Meaning
`"item1"`	`4`	The participant selected response 4 on item 1
`"item2"`	`3`	The participant selected response 3 on item 2
`"item3"`	`1`	The participant selected response 1 on item 3

Although this looks organized, R does not automatically treat JSON as a data frame or list. When we import our .csv, R reads the entire JSON object as a single character string because it sees only text in that cell.

To analyze these responses, we first need to decode the JSON into a structure that R can work with.

We will do this using two steps:

The function fromJSON() from the jsonlite package will read the JSON string and convert it into an R list, where each item corresponds to a questionnaire response.
The function unlist() will flatten that list into a simple numeric vector so that we can calculate scores (for example, reversing certain items and taking a mean or total).

This conversion step is essential because R cannot perform numeric operations on text. Once the responses are numeric, we can apply statistical logic to compute questionnaire scores, check for missing data, and use the results in later analyses.

5 · Pulling It All Together

At this point, you have:

Organized your project so that data, scripts, and reports each have a clear place.
Learned several key R functions that will support your workflow.
Successfully imported a single participant file into R and explored its structure.
Identified where the questionnaire data live and how they are stored as a JSON object.

These steps may feel basic, but they form the foundation for every later stage of data analysis. Before we can write functions, filter data, or calculate scores, we must be certain that we understand the structure and meaning of what we’re working with.

In the next video, we’ll start transforming this raw data into something we can analyze. You’ll learn how to score the questionnaire responses, filter implausible reaction times, and compute the first participant-level summary metrics.