← More Guides

R Project Structure - Best Practices

Video Notes

One of the simplest and most powerful workflow improvements you can make in R is adopting a clear, consistent project structure.

A clean structure doesn’t just make your files look organized, it makes your work easier to understand, reproduce, share, and scale.

Below is a simple pattern you can follow:

project/
    data/
    R/
    scripts/
    outputs/
    reports/
    README.md
    _quarto.yml
    .gitignore

📂 A copy of this directory structure can be downloaded here...

There is no single “correct” structure for R projects. Different teams may prefer different conventions. The key is not which pattern you choose- it’s that you choose a logical pattern and follow it consistently.

Let’s walk through each component...

The data/ Folder

This is where your datasets live.

Use it for:

Taking things a step further, it can be useful to separate raw and processed data. This prevents accidental overwriting and makes your workflow easier to audit and reproduce. Raw data should remain untouched. Processed data is generated by your scripts.

data/
    raw/
    processed/

The R/ Folder

This folder contains reusable functions. Think of it as your internal package. These files define building blocks that your scripts can call.

The R/ folder is for reusable logic, not full workflows.

Example:

R/
    clean_scores.R
    fit_score_model.R
    plot_scores.R

Each file defines a function, but does not execute analysis on its own.

The scripts/ Folder

This is where execution happens.

Scripts in this folder:

Example:

scripts/
    scores.R

A typical script might look like this:

# Load packages
library(dplyr)
library(ggplot2)

# Source functions
source("R/clean_scores.R")
source("R/fit_score_model.R")
source("R/plot_scores.R")

# Load raw data
scores_raw <- read.csv("data/raw/scores.csv")

# Clean data
scores <- clean_scores(scores_raw)

# Save cleaned data (useful for downstream steps)
write.csv(scores, "data/processed/scores_clean.csv", row.names = FALSE)

# Quick exploration
scores_raw %>%
  summarise(
    n = n(),
    n_missing_final = sum(is.na(final_score)),
    max_final = max(final_score, na.rm = TRUE),
    mean_hours = mean(study_hours),
    mean_final = mean(final_score, na.rm = TRUE)
  ) %>%
  print()

# Investigate suspicious values
scores_raw %>%
  filter(final_score > 100) %>%
  print()

# Fit model and save summary
model <- fit_score_model(scores)
sink("outputs/model_summary.txt")
print(summary(model))
sink()

# Generate and save plot
p <- plot_scores(scores)
ggsave(
  filename = "outputs/final_score_vs_study_hours.png",
  plot = p,
  width = 8,
  height = 5,
  dpi = 160
)

# Render report
quarto::quarto_render("reports/report.qmd")

# Done
message("Done. See data/processed/ and outputs/ for results.")

As projects grow, it often makes sense to split large scripts into multiple focused scripts- for example:

For even more scalability, consider organizing your workflow with a package like {targets} to create a structured data pipeline.

The reports/ folder

This folder contains Quarto (or similar) files that generate computational reports.

It holds the source files, not the rendered outputs.

Rendered reports should be written to the outputs/ directory. You can configure this using a _quarto.yml file in the project root:

project:
  output-dir: outputs

This keeps source files separate from generated artifacts, reinforcing the principle of separating code from outputs.

Final Takeaway

A consistent project structure is foundational to good R workflow and aims to:

Unlock all the notes for $4

No subscriptions, no auto-renewals.

Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.

Unlocking gets you access to the notes for this video plus all 200+ guides on this site.

Your support is appreciated. Thank you!

Payment Info

/
$4 6 months
$25 forever
Please check the form for errors
Questions? help@codewithsusan.com
← More Guides