← Other topics

R Targets Package Tutorial: Build Reproducible Data Pipelines

Video Notes

The {targets} package helps you build reproducible, scalable data analysis pipelines in R. Instead of running scripts manually from top to bottom, {targets} lets you define your workflow as a series of steps with explicit dependencies.

It then tracks what depends on what, detects what changed, and re-runs only what is necessary.

This makes your projects faster, cleaner, and reproducible.

{targets} is especially useful when:

For small, one-off scripts, {targets} may be unnecessary. But for multi-step, iterative projects, it provides structure and safety.

Core Concept: Targets

A target is an object produced by your workflow.

It could be a cleaned dataset, a trained model, a plot, a report, etc.

Each target declares its dependencies. {targets} builds a dependency graph and only re-runs targets affected by changes.

Example

For an example, we will set up a pipeline to:

  1. Read in some student test scores (scores.csv)
  2. Clean the data
  3. Fit a regression model
  4. Generate a plot

To begin, install the {targets} package:

install.packages("targets")

In your project directory, create a file named _targets.R. This file defines your pipeline.

library(targets)

# Declare packages used in the pipeline
tar_option_set(packages = c("dplyr", "ggplot2"))

# List steps of your pipeline
list(
 
  # Step 1: Register the data file
  tar_target(file, "data/scores.csv", format = "file"),

  # Step 2: Load raw student data
  tar_target(
    scores_raw,
    read.csv(file)
  ),

  # Step 3: Remove students missing final exam scores
  tar_target(
    scores,
    scores_raw %>%
      filter(!is.na(final_score))
  ),

  # Step 4: Fit regression model (predict final exam score from study hours)
  tar_target(
    model,
    lm(final_score ~ study_hours, data = scores)
  ),

  # Step 5: Generate plot
  tar_target(
    plot,
    ggplot(scores, aes(x = study_hours, y = final_score)) +
      geom_point() +
      geom_smooth(method = "lm", se = FALSE) +
      labs(
        title = "Final Exam Score vs. Study Hours",
        x = "Study Hours",
        y = "Final Exam Score"
      )
  )
)

Observe that each tar_target() defines:

  1. A name (e.g. file, scores_raw, scores, model, plot)
  2. The code that produces it

Dependencies are detected automatically. E.g. Both model and plot depends on scores, and scores depends on scores_raw.

Running the Pipeline

To execute the pipeline, run the command:

tar_make()

Upon execution, the {targets} package will check which targets are outdated and run only those. It will skip everything else that is already up to date. For example, If you change only the modeling code, only model re-runs.

Accessing Results

Results are stored in the _targets/ directory, not in your global environment.

To retrieve a result:

tar_read(plot)

To load it into your workspace:

tar_load(plot)

Forcing Targets to Re-run

To re-run a specific target and its downstream dependencies:

tar_invalidate(model)
tar_make()

To rebuild everything:

tar_destroy(destroy = "objects")
tar_make()

Organizing Pipelines with Functions

As projects grow, placing all code directly in _targets.R becomes hard to maintain. A better approach is to move computation into functions stored in an R/ directory.

For example, create R/generate_plot.R:

generate_plot <- function(data) {
  ggplot(data, aes(x = study_hours, y = final_score)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    labs(
      title = "Final Exam Score vs. Study Hours",
      x = "Study Hours",
      y = "Final Exam Score"
    )
}

Then add tar_source() to _targets.R before your pipeline:

tar_source()

Then, update the plot target:

tar_target(
    plot,
    generate_plot(clean_data)
  )

Visualizing the Pipeline

To see the dependency graph, run:

tar_visnetwork()

This shows how targets connect. It is useful for understanding and debugging your pipeline structure.

Green: up to date Blue: Outdated and needs to run

You can also see a summary of target commands via:

tar_manifest(fields = all_of("command"))

Example:

> tar_manifest(fields = all_of("command"))
# A tibble: 5 × 2
  name       command                                       
  <chr>      <chr>                                         
1 file       "\"data/scores.csv\""                         
2 scores_raw "read.csv(file)"                              
3 scores     "scores_raw %>% filter(!is.na(final_score))"  
4 plot       "generate_plot(scores)"                       
5 model      "lm(final_score ~ study_hours, data = scores)"

Final Code

targets.R_:

library(targets)

# Declare packages used in the pipeline
tar_option_set(packages = c("dplyr", "ggplot2"))

tar_source()

# List steps of your pipeline
list(
  # Step 1: Register the data file
  tar_target(file, "data/scores.csv", format = "file"),

  # Step 2: Load raw student data
  tar_target(
    scores_raw,
    read.csv(file)
  ),

  # Step 3: Remove students missing final exam scores
  tar_target(
    scores,
    scores_raw %>%
      filter(!is.na(final_score)) %>%
      filter(final_score < max(final_score, na.rm = TRUE))
  ),

  # Step 4: Fit regression model (predict final exam score from study hours)
  tar_target(
    model,
    lm(final_score ~ study_hours, data = scores)
  ),

  # Step 5: Generate plot
  tar_target(
    plot,
    generate_plot(scores)
  )
)

R/generate_plot.r:

generate_plot <- function(scores) {
  ggplot(scores, aes(x = study_hours, y = final_score)) +
    geom_point() +
    geom_smooth(method = "lm", se = FALSE) +
    labs(
      title = "Final Exam Score vs Study Hours",
      x = "Study Hours",
      y = "Final Exam Score"
    )
}

Unlock all the notes for $4

No subscriptions, no auto-renewals.

Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.

Unlocking gets you access to the notes for this video plus all 200+ guides on this site.

Your support is appreciated. Thank you!

Payment Info

/
$4 6 months
$25 forever
Please check the form for errors
Questions? help@codewithsusan.com
← Other topics