calculations • OCepi

add_percent()

One of the most used calculations is proportion. If you do not want to multiply proportion by 100, add argument multiply = FALSE.

linelist |>
  mutate(race_ethnicity = recode_race(Ethnicity, Race, abbr_names = TRUE)) |>
  count(race_ethnicity) |>
  mutate(percent = add_percent(n, digits = 1))
#> # A tibble: 9 × 3
#>   race_ethnicity             n percent
#>   <chr>                  <int>   <dbl>
#> 1 AI/AN                     10     9.5
#> 2 Asian                      7     6.7
#> 3 Black/African American    16    15.2
#> 4 Hispanic/Latinx           19    18.1
#> 5 Missing/Unknown            8     7.6
#> 6 Multiple Races             6     5.7
#> 7 NHOPI                     10     9.5
#> 8 Other                     15    14.3
#> 9 White                     14    13.3

n_percent()

When preparing data for {ggplot2}, you may want to create labels with n and percent. Use argument n_suppress to suppress low values and reverse to set order of n and % (default is n then %).

linelist |>
  mutate(race_ethnicity = recode_race(Ethnicity, Race, abbr_names = TRUE)) |>
  count(race_ethnicity) |>
  mutate(
    percent = add_percent(n, digits = 1),
    label = n_percent(n, percent, reverse = TRUE, n_suppress = 10)
    )
#> # A tibble: 9 × 4
#>   race_ethnicity             n percent label     
#>   <chr>                  <int>   <dbl> <chr>     
#> 1 AI/AN                     10     9.5 9.5% (10) 
#> 2 Asian                      7     6.7 **        
#> 3 Black/African American    16    15.2 15.2% (16)
#> 4 Hispanic/Latinx           19    18.1 18.1% (19)
#> 5 Missing/Unknown            8     7.6 **        
#> 6 Multiple Races             6     5.7 **        
#> 7 NHOPI                     10     9.5 9.5% (10) 
#> 8 Other                     15    14.3 14.3% (15)
#> 9 White                     14    13.3 13.3% (14)

dashboard_tbl()

Pre-aggregating data for dashboarding can be a tedious and disorganized task. One way to aggregate and organize your data is to summarize all your data into one table. To use dashboard_tbl, start with your data, then select which columns you want to summarize, then call the function:

linelist %>%
  mutate(
    race_ethnicity = recode_race(Ethnicity, Race),
    agegroups = age_groups(Age),
    Gender = recode_gender(Gender)
    ) %>%
  select(Gender, agegroups, race_ethnicity) %>%
  dashboard_tbl()
#>          Variable                               Category  n Percent      Label
#> 1          Gender                                 Female 51    48.6 48.6% (51)
#> 2          Gender                                   Male 39    37.1 37.1% (39)
#> 3          Gender                        Missing/Unknown 15    14.3 14.3% (15)
#> 4       agegroups                                    0-9 12    11.4 11.4% (12)
#> 5       agegroups                                  10-19 11    10.5 10.5% (11)
#> 6       agegroups                                  20-29 11    10.5 10.5% (11)
#> 7       agegroups                                  30-39 13    12.4 12.4% (13)
#> 8       agegroups                                  40-49 12    11.4 11.4% (12)
#> 9       agegroups                                  50-59 12    11.4 11.4% (12)
#> 10      agegroups                                  60-69  6     5.7   5.7% (6)
#> 11      agegroups                                  70-79 17    16.2 16.2% (17)
#> 12      agegroups                                    80+ 11    10.5 10.5% (11)
#> 13      agegroups                        Missing/Unknown  0     0.0     0% (0)
#> 14 race_ethnicity          American Indian/Alaska Native 10     9.5  9.5% (10)
#> 15 race_ethnicity                                  Asian  7     6.7   6.7% (7)
#> 16 race_ethnicity                 Black/African American 16    15.2 15.2% (16)
#> 17 race_ethnicity                        Hispanic/Latinx 19    18.1 18.1% (19)
#> 18 race_ethnicity                        Missing/Unknown  8     7.6   7.6% (8)
#> 19 race_ethnicity                         Multiple Races  6     5.7   5.7% (6)
#> 20 race_ethnicity Native Hawaiian/Other Pacific Islander 10     9.5  9.5% (10)
#> 21 race_ethnicity                                  Other 15    14.3 14.3% (15)
#> 22 race_ethnicity                                  White 14    13.3 13.3% (14)

It is important to remember you only pass the columns you want summarized into the function, otherwise it will attempt to summarize all columns. If you’re wanting to summarize by group or year, use group_by and include that column in the group of columns you pass into the function. To adjust digits on percent via digits argument, and change n % order using reverse.

rate_per_100k()

Another common calculation for our team are is incidence rate per 100,000.

linelist |>
  mutate(race_ethnicity = recode_race(Ethnicity, Race, abbr_names = TRUE)) |>
  filter(race_ethnicity %in% c("White","NHOPI","Multiple Races")) |>
  count(race_ethnicity) |>
  mutate(incidence_rate = case_when(
    race_ethnicity == "White" ~ rate_per_100k(n, 1500000, digits = 1),
    race_ethnicity == "NHOPI" ~ rate_per_100k(n, 100000, digits = 1),
    race_ethnicity == "Multiple Races" ~ rate_per_100k(n, 200000, digits = 1)
    )
  )
#> # A tibble: 3 × 3
#>   race_ethnicity     n incidence_rate
#>   <chr>          <int>          <dbl>
#> 1 Multiple Races     6            3  
#> 2 NHOPI             10           10  
#> 3 White             14            0.9

time_between()

time_between() calculates the time elapsed between two dates in the following units: days, weeks, months and years. Value is always rounded down to account for complete elapsed time. Note: order of dates does not necessarily matter, but beware of sign.

episode_date = as.Date("2020-04-01")
death_date = as.Date("2020-05-15")

time_between(death_date, episode_date, unit = "days")
#> [1] 44
time_between(episode_date, death_date, unit = "days")
#> [1] -44

dob = as.Date("1986-04-01")
vaccine_date = as.Date("2023-05-15")

time_between(vaccine_date, dob, unit = "years")
#> [1] 37
time_between(dob, vaccine_date, unit = "years")
#> [1] -37