There are two primary styles of working in R.
Style A - Base R
The first style involves using R “out of the box” - where you use techniques, syntax, and functions that come built into R by default. We’ll call this style “base R”.
Style B - Tidyverse
The second style uses the Tidyverse — a collection of packages designed to make common data tasks like wrangling, analysis, and visualization more intuitive and consistent.
The Tidyverse collection of packages is made up of the following 8 core packages:
Run the command install.packages("tidyverse")
to install these packages on your computer.
Then, in your R scripts, you can choose to load all the Tidyverse packages at once:
library(tidyverse)
Or, for better performance and clarity, load only what you need:
libary(dplyr)
libary(ggplot2)
Once you have Tidyverse set up, let’s explore some of its standout features and compare them to base R.
To highlight the Tidyverse’s expressive syntax, let’s look at the dplyr package. Suppose we want to select the mpg and hp columns from the built-in mtcars dataset, but only for cars with mpg > 20.
Here’s the code to do this in Base R:
results <- mtcars[mtcars$mpg > 20, c("mpg", "hp")]
And here’s the code to do this using the Tidyverse dplyr package:
library(dplyr)
results <- mtcars %>%
filter(mpg > 20) %>%
select(mpg, hp)
Of these two examples, the Tidyverse/dplyr code be described as more expressive - meaning it almost reads like “plain English” such that even if you had no experience with R, you could make a pretty good guess as to what the code does. It even uses “plain English” verbs like filter and select which are intuitive to understand.
The base R example, however, is more obscure and requires an understanding of the base R syntax used, including the square brackets for subsetting and the $ extraction operator.
One of the most prominent features of Tidyverse packages is the use of the pipe operator, written as %>%
. The pipe operator allows you to pipe data into functions, which can make code easier to read.
For example, here’s a take on the filter and select operation (introduced above), without the use of pipes, written in a traditional ”nested” fashion:
results <- filter(select(mtcars, mpg, hp), mpg > 20)
To dissect this code, we have to read it inside out, first understanding that select(mtcars, mpg, hp)
yields the mpg and hp columns of mtcars. Then, the result of that function is passed to the filter function.
Here’s that same procedure rewritten with pipes:
results <- mtcars %>%
filter(mpg > 20) %>%
select(mpg, hp)
In this code, we start with the data we’re working with - mtcars. We pipe this data to the filter function. The result of that operation is then piped to the select function.
The end result is code that reads like a sequence of instructions.
Here’s another example with even more steps to emphasize the readability of Tidyverse vs. base R to accomplish the following task:
filter the cars to only include fuel-efficient ones, calculating how much horsepower they have per unit of fuel efficiency, sorting them by that measure, and then narrowing the results down to just the relevant columns
Base R:
results <- mtcars[mtcars$mpg > 20, ]
results$hp_per_mpg <- results$hp / results$mpg
results <- results[order(-results$hp_per_mpg), ]
results <- results[, c("mpg", "hp", "hp_per_mpg")]
Tidyverse:
mtcars %>%
filter(mpg > 20) %>%
mutate(hp_per_mpg = hp / mpg) %>%
arrange(desc(hp_per_mpg)) %>%
select(mpg, hp, hp_per_mpg)
Observe the "do this, then this, then this" nature of the Tidyverse syntax when pipes are used.
In 2021, base R introduced its own pipe syntax, |>
that works similarly to the Tidyverse pipe %>%
with some exceptions and limitations. Learn more: Differences between the base R and magrittr pipes.
Another feature you’ll encounter when using Tidyverse packages is the tibble data structure provided by the tibble package. Tibbles provide a more user-friendly, consistent, and Tidyverse-compatible version of base R’s data frame.
Just like data frames, tibbles are created by specifying vectors to be combined into a table-like structure:
library(tibble)
my_data <- tibble(
name = c("Alice", "Bob"),
age = c(25, 30)
)
You can convert existing data frames to tibbles:
mtcars_tibble <- as_tibble(mtcars)
There are several advantages to tibbles over data frames, starting with cleaner output when printing. With tibbles, printing will only display the first 10 rows and as many columns as fit on screen. Extra context on data types and row numbers is also provided.
Tibbles also use strict subsetting, meaning if you try to reference a column that does not exist, you will get an error. This might seem like a downside, but being alerted when you reference something that does not exist is better than having your code mysteriously fail. Example:
df <- data.frame(x = "text")
df$y # returns NULL silently
tb <- tibble(x = "text")
tb$y # returns an error: object 'y' not found
The above is just a subset of benefits of tibbles. Learn more here: tibble.tidyverse.org.
The name “tibble” is a play on the word “table”. Tibble originally came from the tbl_df class, introduced in the Tidyverse dplyr package. The class name tbl_df stood for “table data frame”. Over time, this evolved into the tibble package and a standalone object class with its own behavior.
This is just a snapshot of what Tidyverse offers. As you explore R, you’ll encounter each package as your needs evolve:
You can browse the full Tidyverse package list here: tidyverse.org/packages.
Regardless of the Tidyverse package you’re working with, you should see a common theme of consistent and intuitive programming style that aims to address some of the “pain points” of base R.
Tidyverse doesn’t replace base R — it extends it.
Most R users write hybrid code that combines both styles. Tidyverse just offers a cleaner interface for common tasks. But you’ll still encounter base R code — in documentation, forums, and older scripts — so understanding both styles is valuable.
Pros of base R
Pros of Tidyverse packages
Tidyverse is a fantastic entry point for modern R programming, especially for data work. But it’s not an either/or — learning both styles will make you a much stronger R user.
For $4, you’ll get 6 months of unlimited access to the notes for the above video
and all of my other 200+ guides and videos.
This is a one-time payment— no subscriptions or auto-renewals to worry about cancelling.
Your support helps me continue creating free, high-quality videos. Thank you!