Video Notes
Let’s explore how R utilizes loops and vectorized operations — two different approaches to performing repetitive tasks. We’ll use a small Stroop task dataset as our example:
stroop_data <- data.frame(
trial = 1:10,
word = c("RED", "GREEN", "BLUE", "YELLOW", "GREEN", "RED", "BLUE", "YELLOW", "RED", "GREEN"),
ink_color = c("RED", "RED", "BLUE", "YELLOW", "GREEN", "GREEN", "RED", "BLUE", "YELLOW", "BLUE"),
condition = c("congruent", "incongruent", "congruent", "congruent",
"congruent", "incongruent", "incongruent", "incongruent",
"incongruent", "incongruent"),
response = c("r", "r", "b", "y", "g", "g", "r", "b", "y", "b"),
correct = c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE),
rt = c(520, 780, 495, 510, 490, 810, 735, 760, 505, 790) # reaction times (ms)
)
Explicit Iteration
Most programming languages, including R, provide loops which are programming structures that let you repeat a block of code multiple times, either for a fixed number of iterations or until a condition is met.
Here’s a simple example of a for loop in R:
for (i in 1:5) {
print(paste("This is loop number", i))
}
We can use loops to perform operations on data frame rows. For example, the following code adds a new variable rt_category to our Stroop dataset, categorizing each participant’s response time as fast (≤ 500 ms) or slow(everything else):
# Create a new (empty) column
stroop_data$rt_category <- NA
# Use a for loop to classify each subject
for (i in 1:nrow(stroop_data)) {
if (stroop_data[i, "rt"] <= 500) {
stroop_data[i, "rt_category"] <- "fast"
} else {
stroop_data[i, "rt_category"] <- "slow"
}
}
Vector Operations
The above approach works, but it doesn’t take advantage of one of R’s most powerful features: vectorization.
Many R functions are vectorized, meaning they can operate on an entire vector or column at once rather than looping through individual elements.
Here’s how to rewrite the same operation using R’s vectorized ifelse() function:
stroop_data$rt_category <- ifelse(stroop_data$rt < 500, "fast", "slow")
In this version, ifelse() checks the entire stroop_data$rt column in a single step.
For each element:
- If the condition
stroop_data$rt < 500 is TRUE, then "fast" is assigned.
- If it’s
FALS then "slow" is assigned.
The result is a new vector of the same length as the input column- no explicit looping required.
Advantages of Vector Operations Over Loops
Vector operations are usually preferred in R for two main reasons:
-
Simpler and more expressive code - They’re shorter, easier to read, and less error-prone. Less code = fewer mistakes.
-
Speed and efficiency - Vectorized functions are typically 10–100× faster than explicit loops because they’re optimized internally in compiled code. They perform calculations on whole vectors at once, rather than one element at a time.
When Loops Are Still Useful
While vectorized code is ideal for data manipulation and calculations, loops are still useful in several situations:
-
File operations or API calls: When you need to perform actions that have side effects (e.g., reading files, writing output, printing).
-
Simulations or incremental building: When later iterations depend on the results of earlier ones.
-
Procedural tasks: When clarity or control flow matters more than speed.
For example, here’s a simple loop that reads all CSV files in a folder and prints a summary of each one:
for (file in list.files("data", full.names = TRUE)) {
df <- read.csv(file)
print(summary(df))
}
You could achieve the same result using the vectorized lapply() function:
files <- list.files("data", full.names = TRUE)
lapply(files, function(file) {
df <- read.csv(file)
summary(df)
})
In this case, both approaches work but a loop may feel more natural or readable.
Example: Using a Loop in a Simulation
Loops are especially useful when generating or storing results step by step.
For example, the code below simulates sampling variability by computing the mean of 10 random uniform numbers across 100 samples:
results <- numeric(100)
for (i in 1:100) {
results[i] <- mean(runif(10))
}
Each iteration draws 10 random numbers, computes their mean, and stores it in the vector results.
Summary
- Vector operations make R code concise, efficient, and expressive.
- Loops remain valuable when sequential or side-effect-driven tasks are required.