Data Types in R
R has a variety of data types and structures to manage different kinds of information. Here's an overview of the main ones you'll encounter, including atomic data types and data structures like vectors, arrays, data frames, and more.
Data Types
1. Atomic Data Types:
Atomic data types are the simplest types in R and represent individual pieces of data.
-
Numeric: Numeric values represent real numbers (e.g., 3.5, -4, 100). They can be integers or decimal numbers.
x <- 3.5 # Numeric value
-
Integer: Integers are whole numbers, created by appending an "L" (e.g., 5L, -2L).
x <- 5L # Integer
-
Character (String): Character values are text, or strings, enclosed in quotes ("Hello", 'World').
x <- "Hello, R!" # Character
-
Logical (Boolean): Logical values are either TRUE or FALSE, often used for conditions and comparisons.
x <- TRUE # Logical value
Data Structures:
Data structures are collections of these atomic data types and can represent more complex data.
-
Vector: A vector is a sequence of elements of the same type, making it the most basic data structure in R.
- Vectors can hold any atomic type, but they must be homogeneous (all elements of the same type).
- You can check the type of a vector with typeof() and the structure with str().
- You can create a vector by including a list of
numbers
or"words"
separated by commas within the concatenate function:c()
. Note, words must be in quotes.
numeric_vector <- c(1.5, 2.3, 5.0) # Numeric vector character_vector <- c("apple", "banana", "cherry") # Character vector typeof(character_vector) # Will output "character" in the console window
-
List: A list is a more flexible structure than a vector because it can hold elements of different types, including vectors, other lists, and even functions. Lists are heterogeneous.
my_list <- list(1.5, "apple", TRUE, c(1, 2, 3)) # Mixed elements
-
Matrix: A matrix is a two-dimensional collection of elements of the same type (usually numeric). You create a matrix with
matrix()
and define the number of rows and columns.my_matrix <- matrix(1:9, nrow = 3, ncol = 3) # 3x3 matrix
-
Array: An array is a multi-dimensional generalization of a matrix, capable of having more than two dimensions. Elements in an array must be of the same type.
my_array <- array(1:12, dim = c(3, 2, 2)) # 3D array
-
Data Frame: A data frame is a two-dimensional structure similar to a table or spreadsheet. Columns in a data frame can have different types (e.g., numeric, character, logical), so it's both homogeneous across rows and heterogeneous across columns. You can access columns using
$
notation such asmy_data$name
, and view data with functions likehead()
.my_data <- data.frame( id = 1:3, name = c("Alice", "Bob", "Charlie"), score = c(85.5, 92.0, 88.5) )
3. Special Data Types:
R can also support a few additional data types that don't fall into the above categories.
-
Factor: A factor represents categorical data and stores unique categories (levels). It’s commonly used for grouping and statistical modeling.
colors <- factor(c("red", "green", "blue", "green", "red"))
-
Function: Functions are first-class objects in R, meaning you can store them in variables, pass them as arguments, and return them from other functions. You define functions using the
function()
keyword.my_function <- function(x, y) { return(x + y) } my_function(3, 5) # Calls the function and returns 8
Summary of R Data Types and Structures
Type | Description |
---|---|
Atomic Types | Numeric, Integer, Character, Logical |
Vector | 1D, homogeneous data (all same type) |
List | 1D, heterogeneous data (can store multiple types) |
Matrix | 2D, homogeneous data |
Array | Multi-dimensional, homogeneous data |
Data Frame | 2D, heterogeneous data (columns can vary in type) |
Factor | Categorical data with unique levels |
Function | First-class objects for performing tasks |