← Other topics

R / RStudio Sessions - Best Practices for Reproducibility (R Simplified)

Video Notes

When you open RStudio, a new R session begins. When you close RStudio, that session ends.

Understanding R sessions is the first step to ensuring reproducibility, efficiency, and organization in your R projects.

There are several components contained in an R session but the three most important ones to understand are:

  1. The working directory
  2. The workspace
  3. Loaded packages

Let’s dig into each...

2. The Working Directory

The R session contains your current working directory - this is the location where R will read/write files.

You can see the current working directory using the getwd() function.

Alternatively, in the RStudio’s Files pane click More > Go To Working Directory.

Two options for inspecting the current working directory in RStudio

It’s important to understand what your working directory is set to because it will be relevant when opening external data files, writing results, etc.

In the example shown in the video, I set up an example project on my computer’s Desktop in a directory called demo. Within this directory I had a single script, demo.R, and a data subdirectory with a single data file called employee-attrition.csv. Here's an outline of the setup:

Then, in the demo.R file, I ran this line to attempt to load the CSV file:

my_data <- read.csv('data/employee-attrition.csv')

Because my current working directory was my home directory, /Users/Susan, this line failed with an error because it attempted to load data/employee-attrition.csv relative to my home directory and there is no file at /Users/Susan/data/employee-attrition.csv.

One way to address this is to adjust the path to be relative to my working directory (/Users/Susan) by adding the Desktop portion of the path:

my_data <- read.csv('Desktop/data/employee-attrition.csv')

Alternatively, I could “hard code” my file reference using absolute paths like so:

my_data <- read.csv('/Users/Susan/Desktop/demo/data/example.csv')

However, neither of the above are considered a best practice because now the path is specific to my computer, and if I shared my code with a collaborator, it would not work on their system. Instead, it would be more ideal if I update my working directory (via the setwd() function) to be the demo folder where project files are:

setwd('/Users/Susan/Desktop/demo')

Then I could simplify all of my file reference paths because they would all be relative to the current project folder:

my_data <- read.csv('data/employee-attrition.csv')

If you’re looking for a best practices way of dealing with working directories, check out my guide on RStudio Projects.

2. The Workspace

The R session contains all the objects (data, variables, functions) stored in memory, referred to as the workspace.

You can see all objects currently in the workspace the ls() function.

Alternatively, you can inspect the workspace via RStudio’s Environment pane.

Two ways to see objects in the current workspace in RStudio

Workspaces can be persisted across sessions by recording their contents to a .RData file that can be re-opened later in a new session. You can do this via the Open and Save buttons in RStudio’s Environment pane, or the following commands:

Save the current workspace:

save.image("~/myWorkspace.RData")

Load a saved workspace:

load("~/myWorkspace.RData")

However, using workspaces in this way is not considered a best practice. Instead, it’s better to make sure your R scripts have the code necessary to populate a workspace with all the objects necessary for the current project.

This allows for better reproducibility when sharing work with collaborators because instead of explicitly saying “here’s what we’re working with”, you’re effectively saying ”here’s the steps to create what we need to work with”. The latter approach yields a more transparent, adaptable, and reproducible product.

By default, when you close an RStudio session it asks if you want to save your workspace. For the reasons listed above, this should be avoided so it’s recommended you disable this feature via Options > Save workspace to .RData on Exit > Never.

Disable R from asking you to save the workspace on exit.

3. Currently Loaded Packages

The R session contains all of your currently loaded packages. This will include a set of base packages that are loaded with all sessions, as well as any packages that have been loaded into the current session via the library() function.

Run the command search() to see the environments and packages currently loaded in your session. Alternatively, you can examine packages via RStudio’s Packages pane.

Inspecting the currently loaded packages in RStudio

When it comes to working with packages, it’s a two-step process:

  1. Make sure the package is installed on your system; to install a package you can use the install.package() function.
  2. Load the package into the current R session via the library() function so it’s available from the scripts that need it.

Of these steps, only step 2 should be included as part of your R scripts. Here’s why:

  1. Installing a package is a one-time action: The install.packages() function downloads and installs the package on your system. Once installed, the package remains available on your computer until you remove it. Including install.packages() in an R script is unnecessary because you only need to install the package once, and repeatedly running it slows execution.
  2. Loading a package is needed in every session: Since R does not persist loaded packages between sessions, every time you start a new session, you must explicitly load any packages your scripts need.

Furthermore, when collaborators run your code and a package is loaded that is not yet installed on their system, R will prompt them to install the package. Because of this, encoding the "install this package" step is not necessary in your code.

If for some reason you decide to encode the installation of packages in your script, you can do so in such a way that it will only be installed if it is not already installed. Here’s an example using the ggplot2 package:

if (!requireNamespace("ggplot2", quietly = TRUE)) {
  install.packages("ggplot2")
}
library("ggplot2")

Other Things in The Session

In addition to your workspace, current working directory, and loaded packages, an R session also contains the following components:

Restarting the Session

Three ways to restart the session in RStudio:

  1. Click Session > Restart R
  2. Run the command .rs.restartR()
  3. Close and re-open RStudio

Reasons you might want to restart the session:

Appendix

Code shown in video:

library("ggplot2")

my_data <- read.csv('data/employee-attrition.csv')

ggplot(my_data, aes(x = satisfaction_level)) +
  geom_histogram(binwidth = 0.05, fill = "blue", color = "black")

Source of employee-attrition.csv example data...

Unlock all the notes for $4

No subscriptions, no auto-renewals.

Just a simple one-time payment that helps support my free, to-the-point videos without sponsered ads.

Unlocking gets you access to the notes for this video plus all 200+ guides on this site.

Your support is appreciated. Thank you!

Payment Info

/
$4 6 months
$25 forever
Please check the form for errors
Questions? help@codewithsusan.com
← Other topics