Video Notes

One of the simplest and most powerful workflow improvements you can make in R is adopting a clear, consistent project structure.

A clean structure doesn’t just make your files look organized, it makes your work easier to understand, reproduce, share, and scale.

Below is a simple pattern you can follow:

project/
    data/
    R/
    scripts/
    outputs/
    reports/
    README.md
    _quarto.yml
    .gitignore

📂 A copy of this directory structure can be downloaded here...

There is no single “correct” structure for R projects. Different teams may prefer different conventions. The key is not which pattern you choose- it’s that you choose a logical pattern and follow it consistently.

Let’s walk through each component...

The data/ Folder

This is where your datasets live.

Use it for:

Raw CSV files
Excel files
Imported datasets
External data sources

Taking things a step further, it can be useful to separate raw and processed data. This prevents accidental overwriting and makes your workflow easier to audit and reproduce. Raw data should remain untouched. Processed data is generated by your scripts.

data/
    raw/
    processed/

The R/ Folder

This folder contains reusable functions. Think of it as your internal package. These files define building blocks that your scripts can call.

The R/ folder is for reusable logic, not full workflows.

Example:

R/
    clean_scores.R
    fit_score_model.R
    plot_scores.R

Each file defines a function, but does not execute analysis on its own.

The scripts/ Folder

This is where execution happens.

Scripts in this folder:

Call functions from R/
Perform analyses
Generate outputs
Orchestrate steps

Example:

scripts/
    scores.R

A typical script might look like this:

# Load packages
library(dplyr)
library(ggplot2)

# Source functions
source("R/clean_scores.R")
source("R/fit_score_model.R")
source("R/plot_scores.R")

# Load raw data
scores_raw <- read.csv("data/raw/scores.csv")

# Clean data
scores <- clean_scores(scores_raw)

# Save cleaned data (useful for downstream steps)
write.csv(scores, "data/processed/scores_clean.csv", row.names = FALSE)

# Quick exploration
scores_raw %>%
  summarise(
    n = n(),
    n_missing_final = sum(is.na(final_score)),
    max_final = max(final_score, na.rm = TRUE),
    mean_hours = mean(study_hours),
    mean_final = mean(final_score, na.rm = TRUE)
  ) %>%
  print()

# Investigate suspicious values
scores_raw %>%
  filter(final_score > 100) %>%
  print()

# Fit model and save summary
model <- fit_score_model(scores)
sink("outputs/model_summary.txt")
print(summary(model))
sink()

# Generate and save plot
p <- plot_scores(scores)
ggsave(
  filename = "outputs/final_score_vs_study_hours.png",
  plot = p,
  width = 8,
  height = 5,
  dpi = 160
)

# Render report
quarto::quarto_render("reports/report.qmd")

# Done
message("Done. See data/processed/ and outputs/ for results.")

As projects grow, it often makes sense to split large scripts into multiple focused scripts- for example:

01_import_and_clean.R
02_analysis.R
03_output.R

For even more scalability, consider organizing your workflow with a package like {targets} to create a structured data pipeline.

The reports/ folder

This folder contains Quarto (or similar) files that generate computational reports.

It holds the source files, not the rendered outputs.

Rendered reports should be written to the outputs/ directory. You can configure this using a _quarto.yml file in the project root:

project:
  output-dir: outputs

This keeps source files separate from generated artifacts, reinforcing the principle of separating code from outputs.

Final Takeaway

A consistent project structure is foundational to good R workflow and aims to:

Reduces confusion
Prevents accidental mistakes
Makes collaboration easier
Supports reproducibility
Scales gracefully as projects grow

# Load packages
library(dplyr)
library(ggplot2)

# Source functions
source("R/clean_scores.R")
source("R/fit_score_model.R")
source("R/plot_scores.R")

# Load raw data
scores_raw <- read.csv("data/raw/scores.csv")

# Clean data
scores <- clean_scores(scores_raw)

# Save cleaned data (useful for downstream steps)
write.csv(scores, "data/processed/scores_clean.csv", row.names = FALSE)

# Quick exploration
scores_raw %>%
  summarise(
    n = n(),
    n_missing_final = sum(is.na(final_score)),
    max_final = max(final_score, na.rm = TRUE),
    mean_hours = mean(study_hours),
    mean_final = mean(final_score, na.rm = TRUE)
  ) %>%
  print()

# Investigate suspicious values
scores_raw %>%
  filter(final_score > 100) %>%
  print()

# Fit model and save summary
model <- fit_score_model(scores)
sink("outputs/model_summary.txt")
print(summary(model))
sink()

# Generate and save plot
p <- plot_scores(scores)
ggsave(
  filename = "outputs/final_score_vs_study_hours.png",
  plot = p,
  width = 8,
  height = 5,
  dpi = 160
)

# Render report
quarto::quarto_render("reports/report.qmd")

# Done
message("Done. See data/processed/ and outputs/ for results.")

R Project Structure - Best Practices

Video Notes

The data/ Folder

The R/ Folder

The scripts/ Folder

The reports/ folder

Final Takeaway

Unlock all the notes for $4

R Project Structure - Best Practices

Video Notes

The data/ Folder

The R/ Folder

The scripts/ Folder

The reports/ folder

Final Takeaway

Unlock all the notes for $4

Related Guides