; PSY 1903
PSY 1903 Programming for Psychologists

Functions and Loops in R

Creating New Functions

In R, creating a new function involves using the function() function and the additional code necessary to perform the function. The syntax for creating a function is:

function_name <- function(arg1, arg2, ...) {
  # Code to perform the function's task
  # Return a result using return() or the final evaluated expression
}
  • function_name: The name of your function, which will be used to call it later.
  • function: calls the function() function to define the following code as a function called function_name.
  • arg1, arg2, ...: Arguments or parameters the function will take. These are optional if the function does not need inputs.
  • Function body: Code inside the { } braces that performs the desired task.
  • Return value: The last evaluated expression is returned automatically, or you can use return(value) to specify what the function should return explicitly.

Function without Arguments (inputs)

For example, remember the JavaScript function for generating a random number between 1 and 10:

// JavaScript random number between 1 and 10 generator
function getRandomNumber() {
    return Math.floor(Math.random() * 10) + 1;
}

This function works by:

  • Generating a random number between 0 (inclusive) and 1 (exclusive) with Math.random().
  • Multiplying it by 10 to get a number between 0 and 10.
  • Using Math.floor to round down to the nearest integer.
  • Adding 1 to shift the range from [0, 9] to [1, 10].

In R, we can use the sample() function to create a similar function.

# R random number between 1 and 10 generator
getRandomNumber <- function() {
  sample(1:10, 1)
}

This function works by:

  • sample(1:10, 1) selects a single integer randomly from the sequence 1:10.
    • The 1 tells sample to return a single number from that range.
  • This avoids the need for rounding and shifts the range directly to [1, 10].

Once the function has been defined (e.g., you have run the code setting up the function once), it will appear in the Functions section of the RStudio Environment Panel. To use this function, use getRandomNumber() within your code as necessary, which will return a random integer between 1 and 10.

Function with Arguments (inputs)

You can also define arguments (or parameters) that the function will accept. These arguments are used to pass data into the function for it to perform a specific task. When defining arguments, it's a good practice to give them descriptive names that reflect the purpose of the variable, for example min and max.

For example, instead of generating a random number that is always between 1 and 10, we can input a minimum and maximum number to our getRandomNumber() function:

getRandomNumber <- function(min, max) {
  sample(min:max, 1)
}
  • min and max: These are the arguments to the function, allowing you to specify the range from which the random number will be drawn.
  • min:max creates a sequence from min to max.
  • sample(min:max, 1): The sample() function takes the sequence from min to max and picks one random number from that range.
    • The 1 tells sample to return a single number from that range.

You can sepcify as many arguments as necessary for the function. For instance, if we wanted our getRandomNumber(min, max) function to return more than one random number from the sequence, we could add an additional input number:

getRandomNumber <- function(min, max, number) {
  sample(min:max, number)
}
  • number: is now the number of numbers we want the function to return. If we input 1 it will return 1 number, 3 will return 3 numbers, etc.
  • For example, getRandomNumber(1, 10, 3) will return 3 random numbers between 1 and 10
    • You can also use the argument names and the = operator to specify the arguments in your function, for example: getRandomNumber(min = 1, max = 10, number = 3)

If a function in R is called without inputs and does not have default values set for its arguments, R will return an error indicating that the required arguments are missing.

getRandomNumber <- function(min, max) {
  sample(min:max, 1)
}

getRandomNumber()
# Error in getRandomNumber() : argument "min" is missing, with no default

Functions with Default Arguments

To help avoid errors for forgetting to input an argument, or to create a function where you typically want it to default to certain arguments but with an option to override them, you can create default values for each argument using the syntax:

my_function <- function(arg1 = default_value1, arg2 = default_value2) {
  # Function body
}

For example, if we want to be able to specify a min, max and number of random numbers to return, but we want to default to a single number between 1 and 10, we can define our getRandomNumber(min, max, number) function as:

getRandomNumber <- function(min = 1, max = 10, number = 1) {
  sample(min:max, number)
}

This will by default return a single number between 1 and 10 if we just use getRandomNumber() without inputting arguments, but we can also input the min, max, and number arguments to overwrite these defaults (or some of the defaults):

getRandomNumber()         # Will return a random number based on the default values of min=1, max=10, number=1
getRandomNumber(2)        # Will override the default minimum value to 2 and return a single number (default number) between 2 and 10 (default maximum)
getRandomNumber(max=20)   # Will override the default maximum value and return a single number (default number) between 1 (default minimum) and 20
getRandomNumber(18,65,1)  # Will override all defaults and return a single number between 18 and 65
getRandomNumber(min=18, max=65, number=1)  # Will override all defaults and return a single number between 18 and 65

Loops and Conditionals

Loops and conditional statements are used to control the flow of code. Specifically, for loops, if statements, and the ifelse() function are frequently used for iteration and decision-making.

For Loops

A for loop allows you to iterate over a sequence (such as a vector or a range of numbers) and perform an action for each element in that sequence. The basic syntax is:

for (variable in sequence) {
  # Code to execute for each iteration
}
  • variable: This is a temporary variable that takes the value of each element in the sequence during each iteration.
    • R users often default to i being the individual or item value in the sequence
  • sequence: A vector, list, or range of numbers that the loop will iterate over.
  • The code inside the loop is executed once for each element in the sequence.

Example: Printing Numbers 1 to 5

for (i in 1:5) {
  print(i)  # Prints numbers from 1 to 5
}
  • The loop starts with a sequence of 1:5 (1 through 5) and selects each individual item i from that sequence
  • The loop starts with i = 1, then i = 2, and so on until i = 5.
  • The loop performs the function on each item i, in this case the print(i) function printing each i value
  • Each value of i is printed on a new line

If we want to add an age column to our mydata data frame and fill it with random numbers between 18 and 65, we can use the combination of a for loop and our getRandomNumber(min, max, number) function. We can also use what we learned about indexing rows dynamically with the nrow() function to dynamically calculate the end of our for loop sequence (making it 1:last_row):

mydata$age <- NA # Creates new column called "age" and fills it with NA (blank) values

for (i in 1:nrow(mydata)) {
  mydata[i, ]$age <- getRandomNumber(min = 18, max = 65, number = 1)
}

Where:

  • for (i in 1:nrow(mydata)): sets up a for loop that iterates over each row of mydata.
    • 1:nrow(mydata) creates a sequence from 1 to the total number of rows in mydata.
    • i is the loop index variable, representing the row number in each iteration.
    • Everything within the { } brackets is the code run on each iteration.
  • mydata[i, ] selects the i-th row of mydata.
  • Using mydata[i, ] lets you access all columns in the i-th row of mydata, so to specify only the age column, we use $age.
    • Now we are only accessing and modifying a single cell: row i column age.
  • getRandomNumber(min = 18, max = 65, number = 1) generates a single number between 18 and 65.
  • The assignment operator <- assigns this random number to the single cell we are accessing.

If Statements

if statements allow us to only execute certain code if a particular condition is TRUE. The basic syntax of an if statement is:

if (condition) {
  # Code to execute if condition is TRUE
}
  • condition: A logical expression that evaluates to TRUE or FALSE.
  • If condition is TRUE, R runs the code inside the braces { }.
  • If condition is FALSE, R skips the code inside the braces { }.

For example, if we wanted to print "You are an adult." but only if age >= 18, we could use the code:

age <- 21 # assign age a value of 21
if (age >= 18) {
  print("You are an adult.")
}
# Output is "You are an adult." because the condition is TRUE

age <- 17 # assign age a value of 21
if (age >= 18) {
  print("You are an adult.")
}
# Nothing happens because the condition is FALSE

Adding an else clause

Adding an else clause allows us to execute a separate block of code if the initial condition is FALSE. The basic syntax for an if else combination is:

if (condition) {
  # Code if condition is TRUE
} else {
  # Code if condition is FALSE
}

For example, if we wanted to print "You are an adult." for ages >= 18, and "You are not an adult." for ages < 18, we could use the code:

if (age >= 18) {
  print("You are an adult.")
} else {
  print("You are not an adult.")
}

Adding an else if clause

If we want to have another conditional, we can use an else if statement. The basic syntax for this type of statement is:

if (condition1) {
  # Code if condition1 is TRUE
} else if (condition2) {
  # Code if condition2 is TRUE
} else {
  # Code if neither condition1 nor condition2 is TRUE
}

So for our age example, we could add two additional else if statements to make a more nuanced age categorization:

if (age >= 65) {
  print("You are a senior.")
} else if (age >= 18) {
  print("You are an adult.")
} else if (age >= 13) {
  print("You are a teen.")
} else {
  print("You are a child.")
}

Nesting for loops and if else statements

You can build more complex code by nesting for loops and if or if else statements. Nesting for loops by placing one for loop inside another is useful for performing operations that involve two or more levels of iteration. For example, you might use nested loops to iterate over rows and columns in a data frame or operations that require comparisons between elements. The basic syntax is:

for (outer_variable in outer_sequence) {
  for (inner_variable in inner_sequence) {
    # Code to execute in the inner loop
  }
  # Code to execute after the inner loop completes for each outer iteration
}

You can also place if or if else statements within a for loop. For example, if we wanted to compare the scores for positiveEmotion and negativeEmotion and categorize each participant by their primaryEmotion, we could use the code:

# Initialize the new 'primaryEmotion' column
mydata$primaryEmotion <- NA

# Loop over each row to apply the logic and create the 'combinedEmotion' column
for (i in 1:nrow(mydata)) {  # Loop over each row
  # Get the positive and negative emotion values for the current row
  pos_emotion <- mydata$positiveEmotion[i]
  neg_emotion <- mydata$negativeEmotion[i]
  
  # Ensure that we handle NA values gracefully
  if (is.na(pos_emotion) || is.na(neg_emotion)) {
    mydata$primaryEmotion[i] <- "unknown"  # If either is NA, set to 'Unknown'
  } else if (pos_emotion > neg_emotion) {
    mydata$primaryEmotion[i] <- "positive"  # Positive emotion is greater
  } else if (neg_emotion > pos_emotion) {
    mydata$primaryEmotion[i] <- "negative"  # Negative emotion is greater
  } else {
    mydata$primaryEmotion[i] <- "neutral"  # Both emotions are equal
  }
}

Explanation:

  • For Loop: We loop over each row of the data frame to examine the values in positiveEmotion and negativeEmotion.
  • If/Else Logic:
    • If either positiveEmotion or negativeEmotion is NA, we assign "Unknown" to primaryEmotion.
    • If positiveEmotion is greater than negativeEmotion, we assign "Positive" to primaryEmotion.
    • If negativeEmotion is greater than positiveEmotion, we assign "Negative" to primaryEmotion.
    • If both emotions are equal, we assign "Neutral" to primaryEmotion.

While nested for loops are useful and often intuitive, they can be slow with large data sets in R. Vectorized functions, apply() family functions, or tidyverse methods (e.g., map() from purrr) are often better choices for performance when you’re performing straightforward operations across data structures.