Last updated on 2024-06-28 | Edit this page
Overview
Questions
- How do you make plots using R?
- How do you customize and modify plots?
Objectives
- Produce scatter plots, boxplots, and time series plots usingggplot.
- Set universal plot settings.
- Describe what faceting is and apply faceting in ggplot.
- Modify the aesthetics of an existing ggplot plot (including axislabels and color).
- Build complex and customized plots from data in a data frame.
We start by loading the required packages.ggplot2
is included in thetidyverse
package.
R
library(tidyverse)
If not still in the workspace, load the data we saved in the previouslesson.
R
surveys_complete <- read_csv("data/surveys_complete.csv")
Plotting with ggplot2
ggplot2
is a plotting package thatprovides helpful commands to create complex plots from data in a dataframe. It provides a more programmatic interface for specifying whatvariables to plot, how they are displayed, and general visualproperties. Therefore, we only need minimal changes if the underlyingdata change or if we decide to change from a bar plot to a scatterplot.This helps in creating publication quality plots with minimal amounts ofadjustments and tweaking.
ggplot2
refers to the name of thepackage itself. When using the package we use the functionggplot()
to generate the plots, and soreferences to using the function will be referred to asggplot()
and the package as a whole asggplot2
ggplot2
plots work best with data inthe ‘long’ format, i.e., a column for every variable, and a row forevery observation. Well-structured data will save you lots of time whenmaking figures with ggplot2
ggplot graphics are built layer by layer by adding new elements.Adding layers in this fashion allows for extensive flexibility andcustomization of plots.
To build a ggplot, we will use the following basic template that canbe used for different types of plots:
ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
- use the
ggplot()
function and bind the plot to aspecific data frame using thedata
argument
R
ggplot(data = surveys_complete)
- define an aesthetic mapping (using the aesthetic (
aes
)function), by selecting the variables to be plotted and specifying howto present them in the graph, e.g., as x/y positions or characteristicssuch as size, shape, color, etc.
R
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length))
add ‘geoms’ – graphical representations of the data in the plot(points, lines, bars).
ggplot2
offers manydifferent geoms; we will use some common ones today, including:geom_point()
for scatter plots, dot plots, etc.geom_boxplot()
for, well, boxplots!geom_line()
for trend lines, time series, etc.
To add a geom to the plot use +
operator. Because wehave two continuous variables, let’s use geom_point()
first:
R
ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point()

The +
in the ggplot2
package is particularly useful because it allows you to modify existingggplot
objects. This means you can easily set up plot“templates” and conveniently explore different types of plots, so theabove plot can also be generated with code like this:
R
# Assign plot to a variablesurveys_plot <- ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length))# Draw the plotsurveys_plot + geom_point()
Notes
- Anything you put in the
ggplot()
function can be seenby any geom layers that you add (i.e., these are universal plotsettings). This includes the x- and y-axis you set up inaes()
. - You can also specify aesthetics for a given geom independently ofthe aesthetics defined globally in the
ggplot()
function. - The
+
sign used to add layers must be placed at the endof each line containing a layer. If, instead, the+
sign isadded in the line before the other layer,ggplot2
will not add the new layer andwill return an error message. - You may notice that we sometimes reference ‘ggplot2’ and sometimes‘ggplot’. To clarify, ‘ggplot2’ is the name of the most recent versionof the package. However, any time we call the function itself, it’s justcalled ‘ggplot’.
- The previous version of the
ggplot2
package, calledggplot
, which alsocontained theggplot()
function is now unsupported and hasbeen removed from CRAN in order to reduce accidental installations andfurther confusion.
R
# This is the correct syntax for adding layerssurveys_plot + geom_point()# This will not add the new layer and will return an error messagesurveys_plot + geom_point()
Challenge (optional)
Scatter plots can be useful exploratory tools for small datasets. Fordata sets with large numbers of observations, such as thesurveys_complete
data set, overplotting of points can be alimitation of scatter plots. One strategy for handling such settings isto use hexagonal binning of observations. The plot space is tessellatedinto hexagons. Each hexagon is assigned a color based on the number ofobservations that fall within its boundaries. To use hexagonal binningwith ggplot2
, first install the R packagehexbin
from CRAN:
R
install.packages("hexbin")library(hexbin)
Then use the geom_hex()
function:
R
surveys_plot + geom_hex()
- What are the relative strengths and weaknesses of a hexagonal binplot compared to a scatter plot? Examine the above scatter plot andcompare it with the hexagonal bin plot that you created.
Building your plots iteratively
Building plots with ggplot2
istypically an iterative process. We start by defining the dataset we’lluse, lay out the axes, and choose a geom:
R
ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point()

Then, we start modifying this plot to extract more information fromit. For instance, we can add transparency (alpha
) to avoidoverplotting:
R
ggplot(data = surveys_complete, aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1)

We can also add colors for all the points:
R
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, color = "blue")

Or to color each species in the plot differently, you could use avector as an input to the argument color.ggplot2
will provide a different colorcorresponding to different values in the vector. Here is an examplewhere we color with species_id
:
R
ggplot(data = surveys_complete, mapping = aes(x = weight, y = hindfoot_length)) + geom_point(alpha = 0.1, aes(color = species_id))

Challenge
Use what you just learned to create a scatter plot ofweight
over species_id
with the plot typesshowing in different colors. Is this a good way to show this type ofdata?
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) + geom_point(aes(color = plot_type))

Boxplot
We can use boxplots to visualize the distribution of weight withineach species:
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) + geom_boxplot()

By adding points to the boxplot, we can have a better idea of thenumber of measurements and of their distribution. Because the boxplotwill show the outliers by default these points will be plotted twice –by geom_boxplot
and geom_jitter
. To avoid thiswe must specify that no outliers should be added to the boxplot byspecifying outlier.shape = NA
.
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) + geom_boxplot(outlier.shape = NA) + geom_jitter(alpha = 0.3, color = "tomato")

Notice how the boxplot layer is behind the jitter layer? What do youneed to change in the code to put the boxplot in front of the pointssuch that it’s not hidden?
Challenges
Boxplots are useful summaries, but hide the shape of thedistribution. For example, if there is a bimodal distribution, it wouldnot be observed with a boxplot. An alternative to the boxplot is theviolin plot (sometimes known as a beanplot), where the shape (of thedensity of points) is drawn.
- Replace the box plot with a violin plot; see
geom_violin()
.
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) +geom_jitter(alpha = 0.3, color = "tomato") +geom_violin()

Challenges (continued)
In many types of data, it is important to consider the scaleof the observations. For example, it may be worth changing the scale ofthe axis to better distribute the observations in the space of the plot.Changing the scale of the axes is done similarly to adding/modifyingother components (i.e., by incrementally adding commands). Try makingthese modifications:
- Represent weight on the log10 scale; see
scale_y_log10()
.
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = weight)) +scale_y_log10() +geom_jitter(alpha = 0.3, color = "tomato") +geom_boxplot(outlier.shape = NA)

Challenges (continued)
So far, we’ve looked at the distribution of weight within species.Try making a new plot to explore the distribution of another variablewithin each species.
- Create boxplot for
hindfoot_length
. Overlay the boxplotlayer on a jitter layer to show actual measurements.
R
ggplot(data = surveys_complete, mapping = aes(x = species_id, y = hindfoot_length)) +geom_jitter(alpha = 0.3, color = "tomato") +geom_boxplot(outlier.shape = NA)

Challenges (continued)
- Add color to the data points on your boxplot according to the plotfrom which the sample was taken (
plot_id
).
Hint: Check the class for plot_id
. Consider changing theclass of plot_id
from integer to factor. Why does thischange how R makes the graph?
Plotting time series data
Let’s calculate number of counts per year for each genus. First weneed to group the data and count records within each group:
R
yearly_counts <- surveys_complete %>% count(year, genus)
Timelapse data can be visualized as a line plot with years on thex-axis and counts on the y-axis:
R
ggplot(data = yearly_counts, aes(x = year, y = n)) + geom_line()

Unfortunately, this does not work because we plotted data for all thegenera together. We need to tell ggplot to draw a line for each genus bymodifying the aesthetic function to includegroup = genus
:
R
ggplot(data = yearly_counts, aes(x = year, y = n, group = genus)) + geom_line()

We will be able to distinguish genera in the plot if we add colors(using color
also automatically groups the data):
R
ggplot(data = yearly_counts, aes(x = year, y = n, color = genus)) + geom_line()

Integrating the pipe operator with ggplot2
In the previous lesson, we saw how to use the pipe operator%>%
to use different functions in a sequence and createa coherent workflow. We can also use the pipe operator to pass thedata
argument to the ggplot()
function. Thehard part is to remember that to build your ggplot, you need to use+
and not %>%
.
R
yearly_counts %>% ggplot(mapping = aes(x = year, y = n, color = genus)) + geom_line()

The pipe operator can also be used to link data manipulation withconsequent data visualization.
R
yearly_counts_graph <- surveys_complete %>% count(year, genus) %>% ggplot(mapping = aes(x = year, y = n, color = genus)) + geom_line()yearly_counts_graph

Faceting
ggplot
has a special technique called facetingthat allows the user to split one plot into multiple plots based on afactor included in the dataset. We will use it to make a time seriesplot for each genus:
R
ggplot(data = yearly_counts, aes(x = year, y = n)) + geom_line() + facet_wrap(facets = vars(genus))

Now we would like to split the line in each plot by the sex of eachindividual measured. To do that we need to make counts in the data framegrouped by year
, genus
, andsex
:
R
yearly_sex_counts <- surveys_complete %>% count(year, genus, sex)
We can now make the faceted plot by splitting further by sex usingcolor
(within a single plot):
R
ggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(facets = vars(genus))

We can also facet both by sex and genus:
R
ggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_grid(rows = vars(sex), cols = vars(genus))

You can also organise the panels only by rows (or only bycolumns):
R
# One column, facet by rowsggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_grid(rows = vars(genus))

R
# One row, facet by columnggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_grid(cols = vars(genus))

Note: ggplot2
before version 3.0.0 usedformulas to specify how plots are faceted. If you encounterfacet_grid
/wrap(...)
code containing~
, please read https://ggplot2.tidyverse.org/news/#tidy-evaluation.
ggplot2
themes
Usually plots with white background look more readable when printed.Every single component of a ggplot
graph can be customizedusing the generic theme()
function, as we will see below.However, there are pre-loaded themes available that change the overallappearance of the graph without much effort.
For example, we can change our previous graph to have a simpler whitebackground using the theme_bw()
function:
R
ggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + theme_bw()

In addition to theme_bw()
, which changes the plotbackground to white, ggplot2
comes withseveral other themes which can be useful to quickly change the look ofyour visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html.theme_minimal()
and theme_light()
are popular,and theme_void()
can be useful as a starting point tocreate a new hand-crafted theme.
The ggthemespackage provides a wide variety of options.
Challenge
Use what you just learned to create a plot that depicts how theaverage weight of each species changes through the years.
R
yearly_weight <- surveys_complete %>% group_by(year, species_id) %>% summarize(avg_weight = mean(weight))
OUTPUT
#> `summarise()` has grouped output by 'year'. You can override using the#> `.groups` argument.
R
ggplot(data = yearly_weight, mapping = aes(x=year, y=avg_weight)) + geom_line() + facet_wrap(vars(species_id)) + theme_bw()

Customization
Take a look at the ggplot2
cheat sheet, and think of ways you could improve the plot.
Now, let’s change names of axes to something more informative than‘year’ and ‘n’ and add a title to the figure:
R
ggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(title = "Observed genera through time", x = "Year of observation", y = "Number of individuals") + theme_bw()

The axes have more informative names, but their readability can beimproved by increasing the font size. This can be done with the generictheme()
function:
R
ggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(title = "Observed genera through time", x = "Year of observation", y = "Number of individuals") + theme_bw() + theme(text=element_text(size = 16))

Note that it is also possible to change the fonts of your plots. Ifyou are on Windows, you may have to install the extrafont
package, and follow the instructions included in the README for thispackage.
After our manipulations, you may notice that the values on the x-axisare still not properly readable. Let’s change the orientation of thelabels and adjust them vertically and horizontally so they don’toverlap. You can use a 90 degree angle, or experiment to find theappropriate angle for diagonally oriented labels. We can also modify thefacet label text (strip.text
) to italicize the genusnames:
R
ggplot(data = yearly_sex_counts, mapping = aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(title = "Observed genera through time", x = "Year of observation", y = "Number of individuals") + theme_bw() + theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 90, hjust = 0.5, vjust = 0.5), axis.text.y = element_text(colour = "grey20", size = 12), strip.text = element_text(face = "italic"), text = element_text(size = 16))

If you like the changes you created better than the default theme,you can save them as an object to be able to easily apply them to otherplots you may create:
R
grey_theme <- theme(axis.text.x = element_text(colour="grey20", size = 12, angle = 90, hjust = 0.5, vjust = 0.5), axis.text.y = element_text(colour = "grey20", size = 12), text=element_text(size = 16))ggplot(surveys_complete, aes(x = species_id, y = hindfoot_length)) + geom_boxplot() + grey_theme

Challenge
With all of this information in hand, please take another fiveminutes to either improve one of the plots generated in this exercise orcreate a beautiful graph of your own. Use the RStudio ggplot2
cheat sheet for inspiration.
Here are some ideas:
- See if you can change the thickness of the lines.
- Can you find a way to change the name of the legend? What about itslabels?
- Try using a different color palette (see https://r-graphics.org/chapter-colors).
Arranging plots
Faceting is a great tool for splitting one plot into multiple plots,but sometimes you may want to produce a single figure that containsmultiple plots using different variables or even different data frames.The patchwork
package allows us to combineseparate ggplots into a single figure while keeping everything alignedproperly. Like most R packages, we can install patchwork
from CRAN, the R package repository:
R
install.packages("patchwork")
After you have loaded the patchwork
package you can use+
to place plots next to each other, /
toarrange them vertically, and plot_layout()
to determine howmuch space each plot uses:
R
library(patchwork)plot_weight <- ggplot(data = surveys_complete, aes(x = species_id, y = weight)) + geom_boxplot() + labs(x = "Species", y = expression(log[10](Weight))) + scale_y_log10()plot_count <- ggplot(data = yearly_counts, aes(x = year, y = n, color = genus)) + geom_line() + labs(x = "Year", y = "Abundance")plot_weight / plot_count + plot_layout(heights = c(3, 2))

You can also use parentheses ()
to create more complexlayouts. There are many useful examples on the patchwork website
Exporting plots
After creating your plot, you can save it to a file in your favoriteformat. The Export tab in the Plot pane in RStudio willsave your plots at low resolution, which will not be accepted by manyjournals and will not scale well for posters. The ggplot2
extensions website provides a list of packages that extend thecapabilities of ggplot2
, includingadditional themes.
Instead, use the ggsave()
function, which allows you toeasily change the dimension and resolution of your plot by adjusting theappropriate arguments (width
, height
anddpi
):
R
my_plot <- ggplot(data = yearly_sex_counts, aes(x = year, y = n, color = sex)) + geom_line() + facet_wrap(vars(genus)) + labs(title = "Observed genera through time", x = "Year of observation", y = "Number of individuals") + theme_bw() + theme(axis.text.x = element_text(colour = "grey20", size = 12, angle = 90, hjust = 0.5, vjust = 0.5), axis.text.y = element_text(colour = "grey20", size = 12), text = element_text(size = 16))ggsave("name_of_file.png", my_plot, width = 15, height = 10)## This also works for plots combined with patchworkplot_combined <- plot_weight / plot_count + plot_layout(heights = c(3, 2))ggsave("plot_combined.png", plot_combined, width = 10, dpi = 300)
Note: The parameters width
and height
alsodetermine the font size in the saved plot.
Key Points
- start simple and build your plots iteratively
- the
ggplot()
function initiates a plot, andgeom_
functions add representations of your data - use
aes()
when mapping a variable from the data to apart of the plot - use
facet_
to partition a plot into multiple plotsbased on a factor included in the dataset - use premade
theme_
functions to broadly changeappearance, and thetheme()
function to fine-tune - the
patchwork
library can combine separate plots into asingle figure - use
ggsave()
to save plots in your favorite format anddimensions