Skip to contents

This article will demonstrate the different ways we extend ggplot2 through themes, colors, and labels.

theme_apollo() & apollo_label()

Using the built-in linelist dataset, we’ll build a plot using our theme, labels, and colors:

dis_x <- linelist

dis_x |>
  mutate(age_groups = age_groups(Age, type = "hcv")) |>
  count(age_groups) |>
  ggplot(aes(x = age_groups, y = n)) +
  geom_bar(stat = "identity", fill = cdcd_color("dodgers")) +
  theme_apollo() +
  apollo_label(aes(label = n), vjust = -0.3) +
  scale_y_continuous(expand = c(0,0), limits = c(0,50)) +
  labs(
    title = "Disease X by Age Group",
    subtitle = "PHS/Communicable Disease Control",
    x = "Age Group (years)",
    y = "Total Cases",
    caption = "*This is a caption."
  )

For horizontal plots or maps, update theme_apollo(direction = "horizontal") or theme_apollo(direction = "map") respectively.

cdcd_color()

Color palette below contains our favorite/most used colors:

end_points()

For line plots with >1 group, it is recommended to direct label groups. end_points() will subset the data to the last date in a time series, even if groups end at different dates (thank you Butte County for the suggestion).

covid <- read.csv("https://data.chhs.ca.gov/dataset/f333528b-4d38-4814-bebb-12db1f10f535/resource/046cdd2b-31e5-4d34-9ed3-b48cdbc4be7a/download/covid19cases_test.csv", na.strings = "", stringsAsFactors = FALSE) |>
  filter(area %in% c("Orange","Los Angeles","San Diego"))

covid <- covid |>
  group_by(area) |>
  mutate(
    date = as.Date(date, "%Y-%m-%d"),
    rate = rate_per_100k(cases, population, digits = 1),
    rate_ma_7 = round(zoo::rollmean(rate, k = 7, align = "right", na.pad = FALSE, fill = 0), digits = 2)
  ) |>
  ungroup() |>
  filter(date <= "2020-12-23", date > "2020-08-01")

ggplot(data = covid, aes(x = date, y = rate_ma_7, color = area)) +
  geom_line(linewidth = 1.2) +
  theme_apollo(legend = "Hide") +
  geom_text(data = end_points(covid, date = date), aes(label = area), hjust = -0.05, show.legend = FALSE, size = 4.5) +
  scale_x_date(date_labels = "%m/%Y", date_breaks = "1 month", expand = expansion(add = c(0,15))) +
  scale_y_continuous(expand = c(0,0), limits = c(0,155)) +
  labs(
    x = "Episode Date",
    y = "Crude Rate per 100k",
    color = "County"
  ) +
  scale_color_manual(values = cdcd_color("dodgers","mustard","london pink"))

wrap_labels()

For categories with long titles (e.g. race/ethnicity), you may need to wrap text to better fit under/next to the axis. Functions like scales::label_wrap() are very useful in wrapping long labels via width argument. Our function will wrap label at whatever delimiter you specify (e.g. or, forward slash, hyphen, etc.)

Without wrapping:

re <- data.frame(group = c("Native Hawaiian or Other Pacific Islander","Black or African American","American Indian/Alaska Native"), score = c(89.5, 84, 73))

ggplot(data = re, aes(x = group, y = score, label = score)) +
  geom_col() +
  theme_apollo() +
  scale_y_continuous(expand = c(0,0), limits = c(0,100)) +
  labs(
    x = "Race/Ethnicity",
    y = "Percentage (%)"
  ) +
  apollo_label(vjust = -0.3)

With wrapping:

ggplot(data = re, aes(x = group, y = score, label = score)) +
  geom_col() +
  scale_x_discrete(labels = wrap_labels(delim = c("or","/"))) +
  theme_apollo() +
  scale_y_continuous(expand = c(0,0), limits = c(0,100)) +
    labs(
    x = "Race/Ethnicity",
    y = "Percentage (%)"
  ) +
  apollo_label(vjust = -0.3)

geom_dumbbell()

Dumbbell plots consist of two points connected by a solid line. Useful in showing changes in pre/post test scores or timelines. Dataset requires two x variables (x and xend) and y variable (usually character/factor).

  • To customize the points, use colour_x, colour_end, and size.
  • To customize line, use colour and linewidth.
pre_post <- data.frame(Country = c("UK-A","US-A","UK-B","US-B"), PRE = c(0,1,2,3), POST = c(4,5,6,7))

ggplot(data = pre_post, aes(x = PRE, xend = POST, y = Country)) +
  geom_dumbbell(colour_x = cdcd_color("mustard"), colour_xend = cdcd_color("mustard"), size = 5, linewidth = 2, colour = "black") +
  labs(
    x = "Pre/Post Scores",
    y = "Country"
    ) +
  theme_apollo()

geom_lollipop()

Lollipop plot is similar to dumbbell plot, but has only one point.

  • To customize the stem, use line.colour, linewidth, and linetype (default stem color is black.)
  • To customize point, use size and color.
  • For most plots, set yend = 0.
ggplot(data = pre_post, aes(x = Country, y = POST, yend = 0)) +
  geom_lollipop(color = cdcd_color("dodgers"), line.colour = "#000000", size = 5, linewidth = 1) +
  labs(
    x = "Country",
    y = "Post Score"
    ) +
  theme_apollo()

An alternate use case for geom_lollipop() is labeling specific dates in time series plot.

set.seed(1234)
df <- data.frame(
  type = "a",
  date = seq.Date(from = as.Date("2023-01-01"), to = as.Date("2023-12-01"), by = "month"),
  value = c(1,2,1,3,4,6,8,10,14,12,9,8),
  stringsAsFactors = FALSE
  )

peak_date = as.Date("2023-07-01")
peak_n = 8

ggplot(df, aes(x = date, y = value)) +
  geom_line(linewidth = 1) +
  annotate("text", x = peak_date, y = peak_n * 0.6, label = "Intervention Started\nHere.", size = 5) +
  geom_lollipop(data = filter(df, date == peak_date), aes(x = date, y = value, yend = value * 0.7), color = cdcd_color("london pink"), line.colour = "#000000", linetype = "dotted", size = 2.5) +
    scale_x_date(expand = expansion(add = c(0,10)), breaks = "month", date_labels = "%b-%y") +
  scale_y_continuous(expand = expansion(add = c(0,10))) +
  theme_apollo()

highlight_geom() & desaturate_geom()

When making data visualizations, emphasizing data points through highlighting/fading may help the viewer see the take home message. {OCepi} providers two ways to do this: highlight_geom() and desaturate_geom(). highlight_geom() requires two basic arguments - 1) an expression (similar to what you’d use in dplyr::filter()), and 2) a color for highlighting. Although sensible defaults are built-in, the following additional arguments within highlight_geom() can be customized:

Please note: the default fade color/fill for highlight_geom() is light grey (#cccccc). To override, add fill/color to geom_* (e.g. geom_line(color = "black").

desaturate_geom() requires the same two basic arguments as highlight_geom() plus desaturate (range 0-1, 1 highest level of desaturation). Instead of fading non-emphasized categories to gray, they will retain color but be desaturated. Options to customize include:

Both highlighting approaches work with facet_wrap() and facet_grid(). Currently works with geom_col()/geom_bar(), geom_line(), geom_sf(), and geom_point(). Please note: your labels will be highlighted if you place the text/label function before the highlight/desaturate function. If you don’t want your labels highlighted, placed labels after highlight/desaturate function.

Bar - Highlight

tbl <- linelist |>
  mutate(age_groups = age_groups(Age, type = "enteric")) |>
  count(age_groups) |>
  mutate(
    percent = add_percent(n, digits = 1),
    label = n_percent(n, percent, reverse = TRUE)
  )

ggplot(data = tbl, aes(x = age_groups, y = percent)) +
  geom_col() +
  apollo_label(data = tbl, aes(label = label), vjust = -0.3) +
  highlight_geom(percent == max(percent), pal = cdcd_color("london pink")) +
  scale_y_continuous(expand = c(0,0), limits = c(0,32), label = scales::label_percent(scale = 1)) +
  theme_apollo() +
  labs(
    x = "Age Groups (years)",
    y = "Proportion (%)"
  )

Line - highlight

ggplot(data = covid, aes(x = date, y = rate_ma_7, group = area)) +
  geom_line(linewidth = 1.2) +
  theme_apollo(legend = "Hide") +
  apollo_label(data = end_points(covid, date = date), aes(label = area), hjust = -0.05) +
  highlight_geom(area == "Orange", pal = cdcd_color("orange")) +
  scale_x_date(date_labels = "%m/%Y", date_breaks = "1 month", expand = expansion(add = c(0,15))) +
  scale_y_continuous(expand = c(0,0), limits = c(0,155)) +
  labs(
    x = "Episode Date",
    y = "Crude Rate per 100k",
    color = "County"
  )

Shapefile/Map - desaturate

base_zip <- oc_zip_sf

ggplot(data = oc_zip_sf) +
  geom_sf() +
  desaturate_geom(Zip %in% c(92702:92708), pal = cdcd_color("dodgers"), desaturate = 0.75, linewidth = 0.5) +
  geom_sf_text(data = base_zip, aes(label = Zip)) +
  theme_apollo(direction = "map")