One of the simplest and most powerful workflow improvements you can make in R is adopting a clear, consistent project structure.
A clean structure doesn’t just make your files look organized, it makes your work easier to understand, reproduce, share, and scale.
Below is a simple pattern you can follow:
project/
data/
R/
scripts/
outputs/
reports/
README.md
_quarto.yml
.gitignore
📂 A copy of this directory structure can be downloaded here...
There is no single “correct” structure for R projects. Different teams may prefer different conventions. The key is not which pattern you choose- it’s that you choose a logical pattern and follow it consistently.
Let’s walk through each component...
This is where your datasets live.
Use it for:
Taking things a step further, it can be useful to separate raw and processed data. This prevents accidental overwriting and makes your workflow easier to audit and reproduce. Raw data should remain untouched. Processed data is generated by your scripts.
data/
raw/
processed/
This folder contains reusable functions. Think of it as your internal package. These files define building blocks that your scripts can call.
The R/ folder is for reusable logic, not full workflows.
Example:
R/
clean_scores.R
fit_score_model.R
plot_scores.R
Each file defines a function, but does not execute analysis on its own.
This is where execution happens.
Scripts in this folder:
Example:
scripts/
scores.R
A typical script might look like this:
# Load packages
library(dplyr)
library(ggplot2)
# Source functions
source("R/clean_scores.R")
source("R/fit_score_model.R")
source("R/plot_scores.R")
# Load raw data
scores_raw <- read.csv("data/raw/scores.csv")
# Clean data
scores <- clean_scores(scores_raw)
# Save cleaned data (useful for downstream steps)
write.csv(scores, "data/processed/scores_clean.csv", row.names = FALSE)
# Quick exploration
scores_raw %>%
summarise(
n = n(),
n_missing_final = sum(is.na(final_score)),
max_final = max(final_score, na.rm = TRUE),
mean_hours = mean(study_hours),
mean_final = mean(final_score, na.rm = TRUE)
) %>%
print()
# Investigate suspicious values
scores_raw %>%
filter(final_score > 100) %>%
print()
# Fit model and save summary
model <- fit_score_model(scores)
sink("outputs/model_summary.txt")
print(summary(model))
sink()
# Generate and save plot
p <- plot_scores(scores)
ggsave(
filename = "outputs/final_score_vs_study_hours.png",
plot = p,
width = 8,
height = 5,
dpi = 160
)
# Render report
quarto::quarto_render("reports/report.qmd")
# Done
message("Done. See data/processed/ and outputs/ for results.")
As projects grow, it often makes sense to split large scripts into multiple focused scripts- for example:
01_import_and_clean.R02_analysis.R03_output.RFor even more scalability, consider organizing your workflow with a package like {targets} to create a structured data pipeline.
This folder contains Quarto (or similar) files that generate computational reports.
It holds the source files, not the rendered outputs.
Rendered reports should be written to the outputs/ directory. You can configure this using a _quarto.yml file in the project root:
project:
output-dir: outputs
This keeps source files separate from generated artifacts, reinforcing the principle of separating code from outputs.
A consistent project structure is foundational to good R workflow and aims to:
No subscriptions, no auto-renewals.
Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.
Unlocking gets you access to the notes for this video plus all 200+ guides on this site.
Your support is appreciated. Thank you!