Getting Started in R
Setting up our basic directory structure
When you open RStudio, a new blank R script should automatically open. If it doesn't, please open one with command
+ shift
+ N
or File
→ New File
→ R Script
. Then copy and paste the following code into a new R Script
#### (1) Setup initial directory structure -------------------------------------
## Start by setting your working directory to your psy1903 folder. Replace "~/Desktop/" with the correct path to your psy1903 directory:
setwd("~/Desktop/psy1903/")
## Create a new parent directory called "stats" where we will be doing all of our R work:
dir.create("stats/")
## Create a new directory called "rIntro" for today's exercises:
dir.create("stats/rIntro")
## Create new subdirectories "data", "scripts", & "output" for today's exercises:
dir.create("stats/rIntro/data")
dir.create("stats/rIntro/scripts")
dir.create("stats/rIntro/output")
## Set working directory to the rIntro/scripts parent directory, which will be our home base for today:
setwd("~/Desktop/psy1903/stats/rIntro/scripts")
## Save this script as R_introduction.R within your scripts directory (you can just use command+S or File → Save As)
After you run this code (select all relevant lines and hit command
+ enter
or hit Run
in the Script Editor panel), you will have a new set of directories for us to work from within your psy1903 directory.
Please also download this data.csv and place it into your data
directory.
Installing Packages
R packages are collections of functions, datasets, documentation, and other resources bundled together to extend R's base capabilities. Packages allow R users to easily share and use code, making it possible to perform specialized tasks like data manipulation, visualization, statistical modeling, and more.
To use an R package, you must first install it. The install.packages()
function downloads and installs a package from CRAN (the Comprehensive R Archive Network), R’s main package repository. For example, to install the ggplot2 package, we could use the code install.packages("ggplot2")
.
Even if you've installed a package, you must load it in your current R session before you can use it. The library()
function allows you to do this. For example, to load the ggplot2 package, we could use the code library("ggplot2")
.
Because this process has to be done every time you start a new R session and it's hard to remember if you've already installed/loaded a package, we can use a package called pacman to help streamline. Copy and paste the following code into your R_introduction script:
#### (2) Installation of packages ----------------------------------------------
## Packages are essential toolboxes that you load into R and allow you to do cool things with your data
## One package called "pacman" makes installing packages very easy...
if (!require("pacman")) {install.packages("pacman"); require("pacman")} # First install and load in pacman to R
## Then use p_load and a list of all of the packages that you need for the project (with each one being in "quotes")
p_load("tidyverse","rstudioapi","lme4","emmeans","psych","corrplot") # tidyverse contains many packages like dplyr, tidyr, stringr, and ggplot2, among others, and the additional packages should cover our data manipulations, plotting, and analyses
Basic Syntax
Anything following a hashtag #
will be a comment in R and will not be read by R when the script executes. Hashtags can be used at the beginning of the line to comment out the full line, or can be used after a line of code so that the remainder of the line is a comment.
R does not have the ability to comment out an entire block of code, you have to place a hashtag at the beginning of every line you want it to ignore. However, you can do this quickly by selecting all lines you want to comment out or uncomment and hitting command
+ shift
+ C
on Mac or control
+ shift
+ C
on Windows.
To create sections of code with a header in R, you can use 4+ hashtags in a row before the header text, and either the same number of hashtags after the header text or a series of hyphens:
#### This will create a section of code -------------------------------------
# This is a comment
3 + 5 # This is also a comment, but the "3 + 5" before the hashtag is executable code
#### This will create another section of code ####
# Best practice is to use some comments to describe the goal of this section/line of code
x <- 3 + 5 # Code goes here
Getting Help
You can look up help for different functions in one of four ways:
- Use the help function
help(function)
will bring up the help documentation for a given function, or you can usehelp("function", package = "package")
to read the help documentation for a function from a specific package if there are multiple functions with the same name - Use the
?
feature to get help with a particular function:?function
will bring up the help documentation for a given function - Use the search feature under the Help panel
- Use the
??
feature to search all functions for a particular string:??"keyword"
will search all functions for the keyword and bring up a list of help documentation to select from
Basic Expressions and Operators
-
+
for addition -
-
for subtraction -
*
for multiplication -
/
for division -
^
for exponentiation -
()
can be used for more complex equations, R will follow order of operations
Variable Assignment
R assigns variables using the operator <-
, following the syntax:
variable <- value/object/function you want to assign
If you have already assigned a variable and then reassign it, R will overwrite the first assignment.
myVar <- 8 # This will create a variable called myVar and assign it a value of 8.
myVar + 2 # This will use the myVar variable and add 2, outputting 10
myVar <- myVar + 2 # This will overwrite the value of 8, and myVar will now be assigned 10 instead
Note: R is case sensitive, so myVar
and MyVar
would be considered two separate variables and not overwrite each other. Additionally, if you mistype MyVar
you will get an error because it is undefined: Error: object 'MyVar' not found
Existing Functions
R has many existing functions, which typically follow the syntax function_name(argument1, argument2, ...)
. The order of arguments is important, and the help feature can often tell you what the order should be.
Some basic built-in functions are sum()
, mean()
, and length()
. If you don't remember what the c()
is doing below, check out the Vector section of the R Data Types page.
sum(1, 2, 3) # Adds numbers 1, 2, and 3, returns 6
mean(c(1, 2, 3)) # Finds the mean (average) of the numbers.
length(c(1, 2, 3)) # Finds the length of a vector, returns 3
Many of these functions can take additional arguments as well. One of the most common across functions is the specification of how to handle NA
data.
mean(c(1, 2, 3, NA, 5)) # Will output NA because it doesn't know how to handle it
mean(c(1, 2, 3, NA, 5), na.rm = TRUE) # Will remove the NA and calculate the mean of the remaining numbers, outputting 2.75 (the correct answer)
Loading a Dataset
The read.csv()
function in R is commonly used to load data from a CSV (Comma-Separated Values) file into R as a data frame.
Basic Syntax of read.csv()
The basic syntax of read.csv() is:
mydata <- read.csv("path/to/your/file.csv")
-
"path/to/your/file.csv"
is the file path to your CSV file. If the file is in your current working directory, you only need to specify the file name (e.g., "file.csv"). -
mydata
is the variable where the data frame will be stored.
Example: Suppose you have a file named "data.csv" in your working directory. You can load it as follows:
mydata <- read.csv("data.csv")
However, if "data.csv" is not in your current working directory, you would need to provide the full path. It is often best practice to include the full path in case your working directory or file structure changes. On Windows, use either double backslashes (\) or single forward slashes (/) to avoid errors.
mydata <- read.csv("~/Desktop/psy1903/stats/rIntro/data/data.csv")
The read.csv() function has several arguments that control how data is read:
-
file
: Specifies the path to the file (required), and should always be the first argument. -
header
: Set toTRUE
by default, indicating that the first row of the CSV file contains column names. Set it toFALSE
if your file doesn’t have headers. -
sep
: Specifies the delimiter used in the file.read.csv()
defaults to a comma (,), so for CSV files, you typically don't need to set this. -
stringsAsFactors
: Set toFALSE
if you don’t want to automatically convert character columns to factors. This argument can help prevent unexpected conversions to categorical data. -
na.strings
: Defines which values should be considered asNA
(missing values). This is helpful if your file uses special symbols to denote missing data, like "NA" or "?".
Checking the Data
Once loaded, it’s a good idea to examine the data using a few basic functions:
head(mydata) # View the first few rows
str(mydata) # See the structure of the data frame
summary(mydata) # Get a summary of each column
Example with Full Options Here’s an example of read.csv() with several options in use:
mydata <- read.csv("~/Desktop/psy1903/stats/rIntro/data/data.csv", header = TRUE, stringsAsFactors = FALSE, na.strings = c("NA", "?"))
Change the Structure of One Column:
To change the structure of one column, in this case moodGroup from character to factor, start by specifying the dataframe mydata
Then we can use the $
to access or extract specific elements of a list or a dataframe by their name, in this case moodGroup
mydata$moodGroup
If we look at the structure of mydata$moodGroup
, we can see that it is currently a character
. To turn it into a factor
, we can use the as.factor()
function:
as.factor(mydata$moodGroup)
However, this just displays the list as a factor within the console, so we need to reassign it within our dataframe:
mydata$moodGroup <- as.factor(mydata$moodGroup)
Now if we check the structure of the dataframe, we can see that moodGroup is a factor and no longer a character:
str(mydata)
Saving and Pushing to GitHub
Save your RScript!
Don't forget to save your RScript frequently File
→ Save
or command
+ S
so that you don't lose your work.
In RStudio, saving your workspace allows you to preserve all objects (like variables, data frames, and functions) currently in your R environment so that you can easily resume your work later. You may be prompted to save your workspace when you close RStudio, or you can do so manually with the command:
save.image(file = "workspace.RData")
Push to GitHub
Once all of your work in R for a given exercise is done, please push your RScripts to GitHub. To do so, open VSCode, go to the Source Control panel to commit and push any changes, just like we did when working on our JavaScript and HTML files.
Download the RScript
Feel free to access my version of the R_Introduction.R script we just created (with a few extra notes) to compare to yours!