A dataset may be written in long or wide format.
Consider an example data set that records information about employees and their sales for a given month.
In long format, we see repeated rows for Employees to encode their sales for each month. These repeated rows make the data “longer”, hence the name long format.
In wide format, we would instead see a single row for each Employee and the addition of columns for each Month. These additional columns makes the data “wider”, hence the name, wide format.
Summary:
Wide format is more human-friendly - it’s easy for us to quickly glance at the data and make comparisons.
Long format is more programming-friendly - it’s better for grouping, summarization, visualization, and statistical analysis.
Because both wide and long format have their advantages, it’s useful to know how to translate between the two a process referred to as "reshaping"
Example data in long format:
data <- read.csv(
text = "
Employee,Month,Sales
Alice,January,5000
Alice,February,5200
Alice,March,5100
Bob,January,4800
Bob,February,4700
Bob,March,4900
Charlie,January,5300
Charlie,February,5400
Charlie,March,5500",
stringsAsFactors = FALSE
)
Example data in wide format:
data <- read.csv(
text = "
Employee,January,February,March
Alice,5000,5200,5100
Bob,4800,4700,4900
Charlie,5300,5400,5500",
stringsAsFactors = FALSE
)
To reshape data we can use the pivot_longer and pivot_wider functions from the tidyr package. In the video, I demonstrate these functions with the following demo data:
Convert from long to wide format:
library(tidyr)
data_wide <- data_long %>%
pivot_wider(
names_from = "Month",
values_from = "Sales"
)
Convert from wide to long format:
library(tidyr)
data_long <- data_wide %>%
pivot_longer(
cols = c("January", "February", "March"),
names_to = "Month",
values_to = "Sales"
)
Instead of indicating which cols to reshape (which could get cumbersome with many columns), you can instead indicate which cols to not reshape using the negative selection operator (-):
data_long <- data_wide %>%
pivot_longer(
cols = -Employee,
names_to = "Month",
values_to = "Sales"
)
No subscriptions, no auto-renewals.
Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.
Unlocking gets you access to the notes for this video plus all 200+ guides on this site.
Your support is appreciated. Thank you!