; PSY 1903
PSY 1903 Programming for Psychologists

Suggestion Box

Spot an error or have suggestions for improvement on these notes? Let us know!

Week 11 · Importing & Inspecting Data

0 · Overview

This week we are moving from isolated R skills to a full data-analysis workflow.
Our goal is to import, inspect, and understand real participant data as a foundation for cleaning and summarizing it later.

You will learn how to:

  1. Navigate a well-structured project.
  2. Recognize key file types and where they live.
  3. Use base R functions to load and inspect participant data.
  4. Understand how questionnaire data are stored as JSON objects.

1 · Project Organization & Workflow Plan

Before importing any files, make sure your project is organized:

psy1903/
└── web/
    └── npt_project/
        ├── npt_project.RProj
        ├── data/
        │   ├── raw/
        │   └── cleaned/
        ├── scripts/
        │   ├── score_questionnaire.R
        │   └── process_participant.R
        └── reports/
            └── npt_import.qmd

This structure supports a reproducible workflow:

  • raw/ holds the untouched data
  • cleaned/ will hold the processed data
  • scripts/ stores reusable functions; we have previously done everything in our Quarto report, but now we will see how we can modularize some aspects
  • reports/ contains the Quarto report where you narrate the analysis

Start by creating the directory structure. Because this only needs to be done onece per preojct, you can copy this directly into your R console pane:

# Create full project folder structure under psy1903/web/
setwd("~/Desktop/psy1903") # Update to your path
dir.create("psy1903/web/npt_project/data/raw", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/data/cleaned", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/scripts", recursive = TRUE, showWarnings = FALSE)
dir.create("psy1903/web/npt_project/reports", recursive = TRUE, showWarnings = FALSE)

# Create placeholder R Script and Quarto Report files
file.create("psy1903/web/npt_project/scripts/score_questionnaire.R")
file.create("psy1903/web/npt_project/scripts/process_participant.R")
file.create("psy1903/web/npt_project/reports/npt_import.qmd")

# Create the R Project file
file.create("psy1903/web/npt_project/npt_project.RProj")

2 · Key R Functions You’ll Encounter in This Lesson

Before we start writing new code, it helps to get familiar with some new R functions we’ll be using throughout this lesson. Each one performs a small but essential role in our workflow, from importing data to combining results at the end.

You don’t need to memorize them all, but understanding what each function does and why we need it will make the rest of this week’s videos much clearer.

The table below summarizes the key tools we’ll rely on. We’ll see each of them in action soon, but this overview provides a reference point so you can recognize their purpose when they appear in the code.

Function Purpose Typical Use Here
read.csv() Imports comma-separated files Load each participant’s .csv
head() / str() Explore data structure Inspect the file after import
unique() Shows all distinct values in a column Check what trial types or blocks exist
fromJSON() Converts a JSON string to an R object Decode questionnaire responses
unlist() Flattens a list into a vector Simplify parsed JSON values
basename() Extracts the file name from a full path Identify which participant a file belongs to
sub() Replaces part of a string using a pattern Remove the .csv extension or rename elements
basename() + sub() Used together to extract/edit file names Create participant IDs
source() Loads and runs another .R script Use custom functions inside Quarto
list.files() Lists all files in a folder Find all participant .csv files
lapply() Applies a function to each element in a list Process all participant files automatically
do.call() Calls a function on a list of objects Combine all participant summaries into one data frame

Mental Models

  • read.csv() → like opening a spreadsheet into R.
  • head() → peek at the first few rows, like scrolling to the top of a sheet.
  • str() → see the blueprint of the data (column types and structure).
  • unique() → ask R “what distinct categories exist here?”
  • fromJSON() → unpack a neatly labeled box of responses.
  • unlist() → flatten a nested box into a single row of values.
  • basename() → strip away folders to reveal the file name.
  • sub() → edit or clean text inside strings.
  • source() → plug in another script so its functions become available.
  • list.files() → ask R, “what files are in this directory?”
  • lapply() → do the same operation for each item in a list (like a batch process).
  • do.call() → combine all those results together into one complete dataset.

3 · Importing and Inspecting a Single Participant File

Before we can analyze data in R, we first have to bring it into our R environment. This process is called importing.

When you collect data from jsPsych or another experimental platform, it’s usually stored in an external file format such as a .csv (comma-separated values) file. This file lives on your computer, but R doesn’t automatically know about it. By importing, we’re telling R:

“Read this file from my project folder and store it as a data frame so I can work with it.” Once imported, the dataset exists as an R object that we can explore, clean, and analyze just like any other R variable.

Why This Step Matters

Importing isn’t just about getting data into R — it’s also about checking the integrity and structure of what you’ve loaded. Even small formatting differences (extra spaces, mismatched column names, missing values) can cause later code to break or produce incorrect results. The inspection step helps us confirm that:

  • the file was read correctly,
  • columns have the expected names and data types, and
  • the experiment produced the structure we expect (one questionnaire row, practice block, experiment block, etc.).

Step-by-Step

  1. Import one file from data/raw/ using read.csv().
    • This reads a single participant’s data file into R as a data frame.
    • Store it in a variable with a meaningful name (e.g., participant_data rather than df or tmp).
  2. Inspect the structure of the imported data.
    • head(participant_data) shows the first few rows so you can preview what the data look like.
    • str(participant_data) displays how R interpreted each column (numeric, character, logical).
    • unique(participant_data$trialType) lists all unique values in the trialType column, showing what kinds of trials exist.
  3. Confirm that the dataset contains:
    • one trialType == "questionnaire" row,
    • blocks labeled "practice" and "experiment",
    • and variables such as rt, response, trial_type, and correct.

If something looks off at this stage (for example, all RTs are character strings instead of numbers), you can fix the import settings before moving on. Catching problems early is part of good data hygiene and can save hours of debugging later.

Key Takeaway Importing and inspecting are the foundation of every data analysis workflow. Before we can filter, summarize, or visualize data, we have to ensure we’re starting from a clean, correctly structured data frame. Think of this as checking your ingredients before cooking. It may seem simple or unimportant, but it determines the quality of everything that follows.


4 · Understanding the JSON Questionnaire

When you ran your experiment, participants filled out a short questionnaire at the end. The responses to that questionnaire are saved in your data file, usually in a single row where trialType == "questionnaire".

If you look at that row in your imported data, you will see that the response column contains a long text string that looks like this:

{"item1":4,"item2":3,"item3":1,"item4":2,"item5":5,"item6":4,"item7":2,"item8":1,"item9":4,"item10":5}

This format is called JSON, which stands for JavaScript Object Notation. JSON is a common way to store structured data so that different programs and programming languages can easily read and exchange it.

Inside the curly braces { }, information is organized as key–value pairs. Each key (for example "item1") represents a variable name or questionnaire item, and each value (for example 4) represents the participant’s response. Together, these pairs describe one participant’s set of answers.

For example:

Key Value Meaning
"item1" 4 The participant selected response 4 on item 1
"item2" 3 The participant selected response 3 on item 2
"item3" 1 The participant selected response 1 on item 3

Although this looks organized, R does not automatically treat JSON as a data frame or list. When we import our .csv, R reads the entire JSON object as a single character string because it sees only text in that cell.

To analyze these responses, we first need to decode the JSON into a structure that R can work with.

We will do this using two steps:

  1. The function fromJSON() from the jsonlite package will read the JSON string and convert it into an R list, where each item corresponds to a questionnaire response.
  2. The function unlist() will flatten that list into a simple numeric vector so that we can calculate scores (for example, reversing certain items and taking a mean or total).

This conversion step is essential because R cannot perform numeric operations on text. Once the responses are numeric, we can apply statistical logic to compute questionnaire scores, check for missing data, and use the results in later analyses.

5 · Pulling It All Together

At this point, you have:

  • Organized your project so that data, scripts, and reports each have a clear place.
  • Learned several key R functions that will support your workflow.
  • Successfully imported a single participant file into R and explored its structure.
  • Identified where the questionnaire data live and how they are stored as a JSON object.

These steps may feel basic, but they form the foundation for every later stage of data analysis. Before we can write functions, filter data, or calculate scores, we must be certain that we understand the structure and meaning of what we’re working with.

In the next video, we’ll start transforming this raw data into something we can analyze. You’ll learn how to score the questionnaire responses, filter implausible reaction times, and compute the first participant-level summary metrics.