← Other topics

R Pipe Operator for Easier to Understand Code

Video Notes

Using the pipe operator in R helps you write code that reads like a sequence of steps. The result is cleaner code that is easier to read, reason about, and explain.

There are two commonly used pipe operators in R:

For basic examples, these pipes are largely interchangeable. Since we’ll be using other tidyverse functionality in our examples, we’ll start with the tidyverse pipe syntax. Later, we’ll discuss the small but important differences between these pipes and why both options exist.

Example Data

For the examples below, we’ll use data from a Stroop experiment. To access this data, install my codepsych package:

install.packages("codepsych", repos = c("https://susanbuck.r-universe.dev"))

Then load it in your script:

library(codepsych)

You will now have access to a demo data set called stroop

Pipes: A Clear Sequence of Steps

To appreciate the usefulness of pipes, let’s start with a simple filter and select operation written in a traditional “nested” style, without pipes:

results <- filter(select(stroop, condition, reaction_time), reaction_time < 500)

To understand this code, we have to read it inside out. First, select(stroop_data, condition, reaction_time) extracts the condition and reaction_time columns from the stroop data. Then, the result of that operation is passed into filter(), which keeps only trials with response times under 500 ms.

Here is the same procedure rewritten using pipes:

results <- stroop %>% 
  filter(reaction_time < 500) %>% 
  select(condition, reaction_time)

In this version, we start logically with the data we are working with (stroop). We then pipe that data into filter(), and the result of that operation is piped into select().

The result is code that reads like a clear sequence of instructions.

Syntax Breakdown

The pipe syntax works by passing whatever appears on the left-hand side into the first argument of the function on the right-hand side.

x %>% y()

Here, the value x is passed as the first argument to the function y().

Another Example

Here’s a more complex example to further emphasize the readability that pipes provide.

The goal of the following code is to:

Here is the code using pipes:

stroop_summary <- stroop %>%
    mutate(correct = response == substr(color, 1, 1)) %>%
    filter(correct, between(reaction_time, 250, 1500)) %>%
    group_by(participant_id, condition) %>%
    summarize(
        n_trials = n(),
        mean_rt = mean(reaction_time),
        .groups = "drop"
    )

Again, notice the clean “do this, then this, then this” structure. Now compare that to the same logic written without pipes:

stroop_summary <- summarize(
    group_by(
        filter(
            mutate(
                stroop,
                correct = response == substr(color, 1, 1)
            ),
            correct,
            between(reaction_time, 250, 1500)
        ),
        participant_id,
        condition
    ),
    n_trials = n(),
    mean_rt = mean(reaction_time),
    .groups = "drop"
)

stroop_summary

This version is functionally identical, but much harder to read and reason about.

Placeholders

Most data-wrangling functions expect the data as their first argument, and pipes take advantage of that. For example:

stroop %>% summarize(mean_rt = mean(reaction_time))

But what if a function does not take the data as its first argument? A common example is lm(). In these cases, you can use placeholders to specify where the data should be inserted.

With the tidyverse pipe (%>%), the placeholder is a period (.):

stroop %>% lm(reaction_time ~ condition, data = .)

With the base R pipe (|>), the placeholder is an underscore (_):

stroop |> lm(reaction_time ~ condition, data = _)

One advantage of the tidyverse pipe is that it allows repeated use of the placeholder and supports placeholder usage inside formulas.

✅ For example, this works with the tidyverse pipe:

stroop %>% lm(reaction_time ~ condition + ., data = .)

❌ But this does not work with the base R pipe:

stroop %>% lm(reaction_time ~ condition + _, data = _)

Tidyverse vs. base R pipes

Now that we understand the basics of pipes, let’s briefly compare tidyverse pipes and base R pipes.

Tidyverse pipes (%>%) came onto the scene first, around 2014, which is why they are far more common in books, tutorials, and existing R code. Many established workflows and teaching materials were built around them.

Base R pipes (|>) were added directly to R in 2021, making them a newer option and, for now, less prevalent. That said, they are increasingly appearing in modern R code, especially in examples that aim to minimize package dependencies.

For most data analysis workflows, either pipe will work, and there is no requirement to choose one universally. Here are a couple rules of thumb to help you decide which to use:

Unlock all the notes for $4

No subscriptions, no auto-renewals.

Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.

Unlocking gets you access to the notes for this video plus all 200+ guides on this site.

Your support is appreciated. Thank you!

Payment Info

/
$4 6 months
$25 forever
Please check the form for errors
Questions? help@codewithsusan.com
← Other topics