Video Notes
Using the pipe operator in R helps you write code that reads like a sequence of steps. The result is cleaner code that is easier to read, reason about, and explain.
There are two commonly used pipe operators in R:
- Base R pipe:
|> (available in R 4.1+, released in 2021)
- Tidyverse pipe:
%>% (introduced around 2014 via the magrittr package and used heavily in dplyr and other tidyverse tools)
For basic examples, these pipes are largely interchangeable. Since we’ll be using other tidyverse functionality in our examples, we’ll start with the tidyverse pipe syntax. Later, we’ll discuss the small but important differences between these pipes and why both options exist.
Example Data
For the examples below, we’ll use data from a Stroop experiment. To access this data, install my codepsych package:
install.packages("codepsych", repos = c("https://susanbuck.r-universe.dev"))
Then load it in your script:
library(codepsych)
You will now have access to a demo data set called stroop
Pipes: A Clear Sequence of Steps
To appreciate the usefulness of pipes, let’s start with a simple filter and select operation written in a traditional “nested” style, without pipes:
results <- filter(select(stroop, condition, reaction_time), reaction_time < 500)
To understand this code, we have to read it inside out. First, select(stroop_data, condition, reaction_time) extracts the condition and reaction_time columns from the stroop data. Then, the result of that operation is passed into filter(), which keeps only trials with response times under 500 ms.
Here is the same procedure rewritten using pipes:
results <- stroop %>%
filter(reaction_time < 500) %>%
select(condition, reaction_time)
In this version, we start logically with the data we are working with (stroop). We then pipe that data into filter(), and the result of that operation is piped into select().
The result is code that reads like a clear sequence of instructions.
Syntax Breakdown
The pipe syntax works by passing whatever appears on the left-hand side into the first argument of the function on the right-hand side.
x %>% y()
Here, the value x is passed as the first argument to the function y().
Another Example
Here’s a more complex example to further emphasize the readability that pipes provide.
The goal of the following code is to:
- Add a new column called correct that checks whether the participant’s response matches the color of the text
- Keep only valid trials by removing incorrect responses and trials outside a reasonable response-time window (250–1500 ms)
- Group the remaining data by participant and condition (congruent vs. incongruent)
- Summarize performance within each group by calculating the number of valid trials (n_trials) and the mean response time (mean_rt)
- Return a clean summary table where each row represents one participant–condition combination
Here is the code using pipes:
stroop_summary <- stroop %>%
mutate(correct = response == substr(color, 1, 1)) %>%
filter(correct, between(reaction_time, 250, 1500)) %>%
group_by(participant_id, condition) %>%
summarize(
n_trials = n(),
mean_rt = mean(reaction_time),
.groups = "drop"
)
Again, notice the clean “do this, then this, then this” structure. Now compare that to the same logic written without pipes:
stroop_summary <- summarize(
group_by(
filter(
mutate(
stroop,
correct = response == substr(color, 1, 1)
),
correct,
between(reaction_time, 250, 1500)
),
participant_id,
condition
),
n_trials = n(),
mean_rt = mean(reaction_time),
.groups = "drop"
)
stroop_summary
This version is functionally identical, but much harder to read and reason about.
Placeholders
Most data-wrangling functions expect the data as their first argument, and pipes take advantage of that. For example:
stroop %>% summarize(mean_rt = mean(reaction_time))
But what if a function does not take the data as its first argument? A common example is lm(). In these cases, you can use placeholders to specify where the data should be inserted.
With the tidyverse pipe (%>%), the placeholder is a period (.):
stroop %>% lm(reaction_time ~ condition, data = .)
With the base R pipe (|>), the placeholder is an underscore (_):
stroop |> lm(reaction_time ~ condition, data = _)
One advantage of the tidyverse pipe is that it allows repeated use of the placeholder and supports placeholder usage inside formulas.
✅ For example, this works with the tidyverse pipe:
stroop %>% lm(reaction_time ~ condition + ., data = .)
❌ But this does not work with the base R pipe:
stroop %>% lm(reaction_time ~ condition + _, data = _)
Tidyverse vs. base R pipes
Now that we understand the basics of pipes, let’s briefly compare tidyverse pipes and base R pipes.
Tidyverse pipes (%>%) came onto the scene first, around 2014, which is why they are far more common in books, tutorials, and existing R code. Many established workflows and teaching materials were built around them.
Base R pipes (|>) were added directly to R in 2021, making them a newer option and, for now, less prevalent. That said, they are increasingly appearing in modern R code, especially in examples that aim to minimize package dependencies.
For most data analysis workflows, either pipe will work, and there is no requirement to choose one universally. Here are a couple rules of thumb to help you decide which to use:
- Use the base R pipe (
|>) if you want to minimize dependencies or are writing code that relies only on base R functionality.
- Use the tidyverse pipe (
%>%) if you are already working within the tidyverse, especially when your pipelines require more flexible placeholder behavior or appear in existing tidyverse-style code.