Introduction to PHEindicatormethods

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following core functions available within PHEindicatormethods:

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
calculate_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Other functions are introduced in separate vignettes.

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015   1 197
#> 2 Area2 2015  71 198
#> 3 Area3 2015  76 129
#> 4 Area4 2015  60 122
#> 5 Area1 2016  64 182
#> 6 Area2 2016  16 194
#> 7 Area3 2016  69 200
#> 8 Area4 2016  52 125

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop       value      lowercl   uppercl confidence
#> 1 Area1 2015   1 197 0.005076142 0.0008966264 0.0281883        95%
#> 2 Area2 2015  71 198 0.358585859 0.2950605233 0.4274940        95%
#> 3 Area3 2015  76 129 0.589147287 0.5028661010 0.6702726        95%
#> 4 Area4 2015  60 122 0.491803279 0.4047061621 0.5794008        95%
#> 5 Area1 2016  64 182 0.351648352 0.2859970732 0.4234327        95%
#> 6 Area2 2016  16 194 0.082474227 0.0514016685 0.1297609        95%
#> 7 Area3 2016  69 200 0.345000000 0.2825979438 0.4132441        95%
#> 8 Area4 2016  52 125 0.416000000 0.3333590675 0.5036499        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence = 99.8)
#>    area year obs pop       value      lowercl    uppercl confidence
#> 1 Area1 2015   1 197 0.005076142 0.0004430205 0.05547352      99.8%
#> 2 Area2 2015  71 198 0.358585859 0.2620151420 0.46816975      99.8%
#> 3 Area3 2015  76 129 0.589147287 0.4536921053 0.71231350      99.8%
#> 4 Area4 2015  60 122 0.491803279 0.3576999292 0.62709667      99.8%
#> 5 Area1 2016  64 182 0.351648352 0.2521750632 0.46591353      99.8%
#> 6 Area2 2016  16 194 0.082474227 0.0393419348 0.16478300      99.8%
#> 7 Area3 2016  69 200 0.345000000 0.2503386597 0.45378858      99.8%
#> 8 Area4 2016  52 125 0.416000000 0.2905149497 0.55340870      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify multiplier to output proportions as percentages
phe_proportion(df, obs, pop, multiplier = 100)
#>    area year obs pop      value     lowercl  uppercl confidence  statistic
#> 1 Area1 2015   1 197  0.5076142  0.08966264  2.81883        95% percentage
#> 2 Area2 2015  71 198 35.8585859 29.50605233 42.74940        95% percentage
#> 3 Area3 2015  76 129 58.9147287 50.28661010 67.02726        95% percentage
#> 4 Area4 2015  60 122 49.1803279 40.47061621 57.94008        95% percentage
#> 5 Area1 2016  64 182 35.1648352 28.59970732 42.34327        95% percentage
#> 6 Area2 2016  16 194  8.2474227  5.14016685 12.97609        95% percentage
#> 7 Area3 2016  69 200 34.5000000 28.25979438 41.32441        95% percentage
#> 8 Area4 2016  52 125 41.6000000 33.33590675 50.36499        95% percentage
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify multiplier for proportion, confidence level and remove metadata columns
phe_proportion(df, obs, pop, confidence = 99.8, multiplier = 100, type = "standard")
#>    area year obs pop      value     lowercl   uppercl
#> 1 Area1 2015   1 197  0.5076142  0.04430205  5.547352
#> 2 Area2 2015  71 198 35.8585859 26.20151420 46.816975
#> 3 Area3 2015  76 129 58.9147287 45.36921053 71.231350
#> 4 Area4 2015  60 122 49.1803279 35.76999292 62.709667
#> 5 Area1 2016  64 182 35.1648352 25.21750632 46.591353
#> 6 Area2 2016  16 194  8.2474227  3.93419348 16.478300
#> 7 Area3 2016  69 200 34.5000000 25.03386597 45.378858
#> 8 Area4 2016  52 125 41.6000000 29.05149497 55.340870

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop      value     lowercl   uppercl confidence
#> 1 Area1 2015   1 197   507.6142    12.85168  2828.245        95%
#> 2 Area2 2015  71 198 35858.5859 28004.68471 45231.369        95%
#> 3 Area3 2015  76 129 58914.7287 46416.38394 73741.534        95%
#> 4 Area4 2015  60 122 49180.3279 37527.67173 63305.921        95%
#> 5 Area1 2016  64 182 35164.8352 27079.85468 44905.428        95%
#> 6 Area2 2016  16 194  8247.4227  4711.08707 13394.040        95%
#> 7 Area3 2016  69 200 34500.0000 26841.88252 43662.599        95%
#> 8 Area4 2016  52 125 41600.0000 31066.63356 54553.968        95%
#>         statistic method
#> 1 rate per 100000  Exact
#> 2 rate per 100000  Byars
#> 3 rate per 100000  Byars
#> 4 rate per 100000  Byars
#> 5 rate per 100000  Byars
#> 6 rate per 100000  Byars
#> 7 rate per 100000  Byars
#> 8 rate per 100000  Byars

# specify multiplier for rate and confidence level
phe_rate(df, obs, pop, confidence = 99.8, multiplier = 100)
#>    area year obs pop      value      lowercl   uppercl confidence    statistic
#> 1 Area1 2015   1 197  0.5076142 5.078682e-04  4.687012      99.8% rate per 100
#> 2 Area2 2015  71 198 35.8585859 2.412038e+01 51.068120      99.8% rate per 100
#> 3 Area3 2015  76 129 58.9147287 4.020077e+01 82.952531      99.8% rate per 100
#> 4 Area4 2015  60 122 49.1803279 3.184853e+01 72.157570      99.8% rate per 100
#> 5 Area1 2016  64 182 35.1648352 2.311658e+01 50.994333      99.8% rate per 100
#> 6 Area2 2016  16 194  8.2474227 3.281938e+00 16.841054      99.8% rate per 100
#> 7 Area3 2016  69 200 34.5000000 2.306344e+01 49.374341      99.8% rate per 100
#> 8 Area4 2016  52 125 41.6000000 2.600267e+01 62.717603      99.8% rate per 100
#>   method
#> 1  Exact
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify multiplier for rate, confidence level and remove metadata columns
phe_rate(df, obs, pop, type = "standard", confidence = 99.8, multiplier = 100)
#>    area year obs pop      value      lowercl   uppercl
#> 1 Area1 2015   1 197  0.5076142 5.078682e-04  4.687012
#> 2 Area2 2015  71 198 35.8585859 2.412038e+01 51.068120
#> 3 Area3 2015  76 129 58.9147287 4.020077e+01 82.952531
#> 4 Area4 2015  60 122 49.1803279 3.184853e+01 72.157570
#> 5 Area1 2016  64 182 35.1648352 2.311658e+01 50.994333
#> 6 Area2 2016  16 194  8.2474227 3.281938e+00 16.841054
#> 7 Area3 2016  69 200 34.5000000 2.306344e+01 49.374341
#> 8 Area4 2016  52 125 41.6000000 2.600267e+01 62.717603

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   208   646 0.322   0.287   0.359 95%        proportion of 1 Wilson
#> 2  2016   201   701 0.287   0.254   0.321 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   208   646 32198.  27971.  36884. 95%        rate per 100000 Byars 
#> 2  2016   201   701 28673.  24846.  32923. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence statistic
#> 1       409           8 27.58073 51.125 28.06694 74.18306        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence = 0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015       208           4  34.7  52    -125.     229. 99.8%      mean     
#> 2  2016       201           4  23.9  50.2   -71.9    172. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence = 0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       208           4  34.7  52    -125.     229.
#> 2  2016       201           4  23.9  50.2   -71.9    172.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0 146 18698
#> 2 Area1 2006 Male       5  84 16357
#> 3 Area1 2006 Male      10 107 16938
#> 4 Area1 2006 Male      15  77 16600
#> 5 Area1 2006 Male      20  36 11298
#> 6 Area1 2006 Male      25 188 10454

Execute calculate_dsr

INPUT: The minimum input requirement for the calculate_dsr function is a single data frame with columns representing the numerators and denominators and standard populations for each standardisation category. The standard populations must be appended to the input data frame by the user prior to execution of the function. The 2013 European Standard Population is provided within the package in vector form (esp2013), which you can join to your dataset. Alternative standard populations can be used but must be provided by the user.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output. It is also possible to calculate CIs when we can’t assume events are independent - further details can be found in the DSR vignette.

Here are some example code chunks to demonstrate the calculate_dsr function and the arguments that can optionally be specified


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1593    293006  575.    546.    606. 95%       
#>  2 Area1  2006 Male          1917    268666  717.    683.    752. 95%       
#>  3 Area1  2007 Female        2228    240152 1009.    965.   1054. 95%       
#>  4 Area1  2007 Male          1643    266019  611.    579.    645. 95%       
#>  5 Area1  2008 Female        1744    303557  636.    605.    668. 95%       
#>  6 Area1  2008 Male          1782    279438  651.    619.    684. 95%       
#>  7 Area1  2009 Female        1944    293929  651.    620.    683. 95%       
#>  8 Area1  2009 Male          1687    292927  607.    576.    639. 95%       
#>  9 Area1  2010 Female        1897    280573  651.    619.    684. 95%       
#> 10 Area1  2010 Male          1743    297686  587.    557.    618. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop, type = "standard")
#> # A tibble: 40 × 8
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1593    293006  575.    546.    606.
#>  2 Area1  2006 Male          1917    268666  717.    683.    752.
#>  3 Area1  2007 Female        2228    240152 1009.    965.   1054.
#>  4 Area1  2007 Male          1643    266019  611.    579.    645.
#>  5 Area1  2008 Female        1744    303557  636.    605.    668.
#>  6 Area1  2008 Male          1782    279438  651.    619.    684.
#>  7 Area1  2009 Female        1944    293929  651.    620.    683.
#>  8 Area1  2009 Male          1687    292927  607.    576.    639.
#>  9 Area1  2010 Female        1897    280573  651.    619.    684.
#> 10 Area1  2010 Male          1743    297686  587.    557.    618.
#> # ℹ 30 more rows

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
  filter(ageband <= 70) %>%
  mutate(refpop = rep(esp2013[1:15], 40)) %>%
  group_by(area, year, sex) %>%
  calculate_dsr(obs, pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1298    238633  580.    549.    613. 95%       
#>  2 Area1  2006 Male          1532    216555  717.    681.    755. 95%       
#>  3 Area1  2007 Female        1907    195161 1042.    994.   1092. 95%       
#>  4 Area1  2007 Male          1268    208013  615.    580.    650. 95%       
#>  5 Area1  2008 Female        1516    236978  660.    627.    694. 95%       
#>  6 Area1  2008 Male          1452    219591  665.    631.    701. 95%       
#>  7 Area1  2009 Female        1473    231710  638.    605.    673. 95%       
#>  8 Area1  2009 Male          1233    227568  587.    554.    621. 95%       
#>  9 Area1  2010 Female        1280    229185  588.    555.    623. 95%       
#> 10 Area1  2010 Male          1356    231197  598.    566.    631. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

  • reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.

  • reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple indirectly standardised rates or ratios then the reference data will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0  1159 125028
#> 2       5   897 137404
#> 3      10   700 111525
#> 4      15   711 109126
#> 5      20   632 106639
#> 6      25   972 110482

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     1593    1910. 0.834   0.794   0.876 95%       
#>  2 Area1  2006 Male       1917    1775. 1.08    1.03    1.13  95%       
#>  3 Area1  2007 Female     2228    1602. 1.39    1.33    1.45  95%       
#>  4 Area1  2007 Male       1643    1735. 0.947   0.902   0.994 95%       
#>  5 Area1  2008 Female     1744    1979. 0.881   0.840   0.924 95%       
#>  6 Area1  2008 Male       1782    1848. 0.964   0.920   1.01  95%       
#>  7 Area1  2009 Female     1944    1930. 1.01    0.963   1.05  95%       
#>  8 Area1  2009 Male       1687    1910. 0.883   0.842   0.927 95%       
#>  9 Area1  2010 Female     1897    1818. 1.04    0.997   1.09  95%       
#> 10 Area1  2010 Male       1743    1963. 0.888   0.847   0.931 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype = "field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1593    1910. 0.834   0.794   0.876
#>  2 Area1  2006 Male       1917    1775. 1.08    1.03    1.13 
#>  3 Area1  2007 Female     2228    1602. 1.39    1.33    1.45 
#>  4 Area1  2007 Male       1643    1735. 0.947   0.902   0.994
#>  5 Area1  2008 Female     1744    1979. 0.881   0.840   0.924
#>  6 Area1  2008 Male       1782    1848. 0.964   0.920   1.01 
#>  7 Area1  2009 Female     1944    1930. 1.01    0.963   1.05 
#>  8 Area1  2009 Male       1687    1910. 0.883   0.842   0.927
#>  9 Area1  2010 Female     1897    1818. 1.04    0.997   1.09 
#> 10 Area1  2010 Male       1743    1963. 0.888   0.847   0.931
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     1593    1910.     655.  546.    520.    574. 95%       
#>  2 Area1  2006 Male      1917    1775.     655.  708.    676.    740. 95%       
#>  3 Area1  2007 Fema…     2228    1602.     655.  911.    873.    950. 95%       
#>  4 Area1  2007 Male      1643    1735.     655.  620.    591.    651. 95%       
#>  5 Area1  2008 Fema…     1744    1979.     655.  577.    551.    605. 95%       
#>  6 Area1  2008 Male      1782    1848.     655.  632.    603.    662. 95%       
#>  7 Area1  2009 Fema…     1944    1930.     655.  660.    631.    690. 95%       
#>  8 Area1  2009 Male      1687    1910.     655.  579.    551.    607. 95%       
#>  9 Area1  2010 Fema…     1897    1818.     655.  683.    653.    715. 95%       
#> 10 Area1  2010 Male      1743    1963.     655.  582.    555.    610. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype = "field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1593    1910.     655.  546.    520.    574.
#>  2 Area1  2006 Male       1917    1775.     655.  708.    676.    740.
#>  3 Area1  2007 Female     2228    1602.     655.  911.    873.    950.
#>  4 Area1  2007 Male       1643    1735.     655.  620.    591.    651.
#>  5 Area1  2008 Female     1744    1979.     655.  577.    551.    605.
#>  6 Area1  2008 Male       1782    1848.     655.  632.    603.    662.
#>  7 Area1  2009 Female     1944    1930.     655.  660.    631.    690.
#>  8 Area1  2009 Male       1687    1910.     655.  579.    551.    607.
#>  9 Area1  2010 Female     1897    1818.     655.  683.    653.    715.
#> 10 Area1  2010 Male       1743    1963.     655.  582.    555.    610.
#> # ℹ 30 more rows