Introduction to PHEindicatormethods

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following functions available within the first release of the package (v1.0.8) but has been updated to apply to these functions in their latest release versions. If further functions are added to the package in future releases these will be explained elsewhere.

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
phe_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015   5 145
#> 2 Area2 2015  73 184
#> 3 Area3 2015  73 129
#> 4 Area4 2015   4 142
#> 5 Area1 2016  16 114
#> 6 Area2 2016  87 125
#> 7 Area3 2016  16 118
#> 8 Area4 2016  55 141

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop      value    lowercl    uppercl confidence
#> 1 Area1 2015   5 145 0.03448276 0.01481722 0.07817743        95%
#> 2 Area2 2015  73 184 0.39673913 0.32885777 0.46884397        95%
#> 3 Area3 2015  73 129 0.56589147 0.47968015 0.64829195        95%
#> 4 Area4 2015   4 142 0.02816901 0.01100774 0.07018631        95%
#> 5 Area1 2016  16 114 0.14035088 0.08827407 0.21587576        95%
#> 6 Area2 2016  87 125 0.69600000 0.61051568 0.76979669        95%
#> 7 Area3 2016  16 118 0.13559322 0.08521666 0.20894806        95%
#> 8 Area4 2016  55 141 0.39007092 0.31349773 0.47247515        95%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence=99.8)
#>    area year obs pop      value     lowercl   uppercl confidence
#> 1 Area1 2015   5 145 0.03448276 0.009538616 0.1169550      99.8%
#> 2 Area2 2015  73 184 0.39673913 0.293046929 0.5106209      99.8%
#> 3 Area3 2015  73 129 0.56589147 0.431147591 0.6915522      99.8%
#> 4 Area4 2015   4 142 0.02816901 0.006822400 0.1089783      99.8%
#> 5 Area1 2016  16 114 0.14035088 0.067658665 0.2686399      99.8%
#> 6 Area2 2016  87 125 0.69600000 0.558758202 0.8054199      99.8%
#> 7 Area3 2016  16 118 0.13559322 0.065307615 0.2604445      99.8%
#> 8 Area4 2016  55 141 0.39007092 0.273999592 0.5200881      99.8%
#>         statistic method
#> 1 proportion of 1 Wilson
#> 2 proportion of 1 Wilson
#> 3 proportion of 1 Wilson
#> 4 proportion of 1 Wilson
#> 5 proportion of 1 Wilson
#> 6 proportion of 1 Wilson
#> 7 proportion of 1 Wilson
#> 8 proportion of 1 Wilson

# specify to output proportions as percentages
phe_proportion(df, obs, pop, multiplier=100)
#>    area year obs pop     value   lowercl   uppercl confidence  statistic method
#> 1 Area1 2015   5 145  3.448276  1.481722  7.817743        95% percentage Wilson
#> 2 Area2 2015  73 184 39.673913 32.885777 46.884397        95% percentage Wilson
#> 3 Area3 2015  73 129 56.589147 47.968015 64.829195        95% percentage Wilson
#> 4 Area4 2015   4 142  2.816901  1.100774  7.018631        95% percentage Wilson
#> 5 Area1 2016  16 114 14.035088  8.827407 21.587576        95% percentage Wilson
#> 6 Area2 2016  87 125 69.600000 61.051568 76.979669        95% percentage Wilson
#> 7 Area3 2016  16 118 13.559322  8.521666 20.894806        95% percentage Wilson
#> 8 Area4 2016  55 141 39.007092 31.349773 47.247515        95% percentage Wilson

# specify level of detail to output for proportion
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl  uppercl confidence  statistic method
#> 1 Area1 2015   5 145  3.448276  0.9538616 11.69550      99.8% percentage Wilson
#> 2 Area2 2015  73 184 39.673913 29.3046929 51.06209      99.8% percentage Wilson
#> 3 Area3 2015  73 129 56.589147 43.1147591 69.15522      99.8% percentage Wilson
#> 4 Area4 2015   4 142  2.816901  0.6822400 10.89783      99.8% percentage Wilson
#> 5 Area1 2016  16 114 14.035088  6.7658665 26.86399      99.8% percentage Wilson
#> 6 Area2 2016  87 125 69.600000 55.8758202 80.54199      99.8% percentage Wilson
#> 7 Area3 2016  16 118 13.559322  6.5307615 26.04445      99.8% percentage Wilson
#> 8 Area4 2016  55 141 39.007092 27.3999592 52.00881      99.8% percentage Wilson

# specify level of detail to output for proportion and remove metadata columns
phe_proportion(df, obs, pop, confidence=99.8, multiplier=100, type="standard")
#>    area year obs pop     value    lowercl  uppercl
#> 1 Area1 2015   5 145  3.448276  0.9538616 11.69550
#> 2 Area2 2015  73 184 39.673913 29.3046929 51.06209
#> 3 Area3 2015  73 129 56.589147 43.1147591 69.15522
#> 4 Area4 2015   4 142  2.816901  0.6822400 10.89783
#> 5 Area1 2016  16 114 14.035088  6.7658665 26.86399
#> 6 Area2 2016  87 125 69.600000 55.8758202 80.54199
#> 7 Area3 2016  16 118 13.559322  6.5307615 26.04445
#> 8 Area4 2016  55 141 39.007092 27.3999592 52.00881

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015   5 145  3448.276  1119.6458  8047.126        95% rate per 100000
#> 2 Area2 2015  73 184 39673.913 31096.7782 49884.675        95% rate per 100000
#> 3 Area3 2015  73 129 56589.147 44355.0945 71153.334        95% rate per 100000
#> 4 Area4 2015   4 142  2816.901   767.5108  7212.386        95% rate per 100000
#> 5 Area1 2016  16 114 14035.088  8017.1131 22793.367        95% rate per 100000
#> 6 Area2 2016  87 125 69600.000 55745.1083 85852.300        95% rate per 100000
#> 7 Area3 2016  16 118 13559.322  7745.3465 22020.710        95% rate per 100000
#> 8 Area4 2016  55 141 39007.092 29383.5646 50774.057        95% rate per 100000
#>   method
#> 1  Exact
#> 2  Byars
#> 3  Byars
#> 4  Exact
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters
phe_rate(df, obs, pop, confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl  uppercl confidence    statistic
#> 1 Area1 2015   5 145  3.448276  0.5099115 11.34810      99.8% rate per 100
#> 2 Area2 2015  73 184 39.673913 26.8450767 56.23698      99.8% rate per 100
#> 3 Area3 2015  73 129 56.589147 38.2906521 80.21398      99.8% rate per 100
#> 4 Area4 2015   4 142  2.816901  0.3017975 10.41841      99.8% rate per 100
#> 5 Area1 2016  16 114 14.035088  5.5850528 28.65934      99.8% rate per 100
#> 6 Area2 2016  87 125 69.600000 48.7830681 95.90284      99.8% rate per 100
#> 7 Area3 2016  16 118 13.559322  5.3957289 27.68783      99.8% rate per 100
#> 8 Area4 2016  55 141 39.007092 24.7315166 58.17287      99.8% rate per 100
#>   method
#> 1  Exact
#> 2  Byars
#> 3  Byars
#> 4  Exact
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify rate parameters and reduce columns output and remove metadata columns
phe_rate(df, obs, pop, type="standard", confidence=99.8, multiplier=100)
#>    area year obs pop     value    lowercl  uppercl
#> 1 Area1 2015   5 145  3.448276  0.5099115 11.34810
#> 2 Area2 2015  73 184 39.673913 26.8450767 56.23698
#> 3 Area3 2015  73 129 56.589147 38.2906521 80.21398
#> 4 Area4 2015   4 142  2.816901  0.3017975 10.41841
#> 5 Area1 2016  16 114 14.035088  5.5850528 28.65934
#> 6 Area2 2016  87 125 69.600000 48.7830681 95.90284
#> 7 Area3 2016  16 118 13.559322  5.3957289 27.68783
#> 8 Area4 2016  55 141 39.007092 24.7315166 58.17287

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   155   600 0.258   0.225   0.295 95%        proportion of 1 Wilson
#> 2  2016   174   498 0.349   0.309   0.392 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   155   600 25833.  21926.  30236. 95%        rate per 100000 Byars 
#> 2  2016   174   498 34940.  29941.  40535. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence statistic
#> 1       329           8 34.38204 41.125 12.38089 69.86911        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence=0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015       155           4  39.6  38.8   -163.    241. 99.8%      mean     
#> 2  2016       174           4  34.3  43.5   -132.    219. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence=0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       155           4  39.6  38.8   -163.    241.
#> 2  2016       174           4  34.3  43.5   -132.    219.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  16 14525
#> 2 Area1 2006 Male       5  41 11517
#> 3 Area1 2006 Male      10  31 17932
#> 4 Area1 2006 Male      15 168 14798
#> 5 Area1 2006 Male      20  91 19848
#> 6 Area1 2006 Male      25 145 16485

Execute phe_dsr

INPUT: The minimum input requirement for the phe_dsr function is a single data frame with columns representing the numerators and denominators for each standardisation category. This is sufficient if the data is:

  • broken down into 19 standardisation categories per grouping set representing the 19 x 5-year age bands from 00-04 to 90+
  • sorted, within each grouping set, by these 19 age bands in ascending order
  • to be standardised against the 2013 European Standard Population

The 2013 European Standard Population is provided within the package in vector form (esp2013) and is used by default by this function. Alternative standard populations can be used but must be provided by the user. When the function joins a standard population vector to the input data frame it does this by position so it is important that the data is sorted accordingly. This is a user responsibility.

The function can also accept standard populations provided as a column within the input data frame.

  • standard populations provided as a vector - the vector and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order

  • standard populations provided as a column within the input data frame - the standard populations can be appended to the input data frame by the user prior to execution of the function - if the data is grouped to generate multiple dsrs then the standard populations will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If standard populations are being provided as a column within the input data frame then the user must specify this using the stdpoptype argument as the function expects a vector by default. The function also accepts additional arguments to specify the standard populations, the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_dsr function and the arguments that can optionally be specified

# calculate separate dsrs for each area, year and sex
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        2122    262014  788.    751.    825. 95%       
#>  2 Area1  2006 Male          1883    305893  639.    608.    670. 95%       
#>  3 Area1  2007 Female        2036    261799  815.    778.    853. 95%       
#>  4 Area1  2007 Male          1997    289411  708.    674.    743. 95%       
#>  5 Area1  2008 Female        1782    294098  651.    618.    684. 95%       
#>  6 Area1  2008 Male          1461    252190  558.    528.    590. 95%       
#>  7 Area1  2009 Female        2354    299239  787.    753.    823. 95%       
#>  8 Area1  2009 Male          1752    285455  640.    610.    672. 95%       
#>  9 Area1  2010 Female        2008    260319  840.    802.    879. 95%       
#> 10 Area1  2010 Male          1842    277287  685.    653.    719. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, type="standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        2122    262014  788.    751.    825.
#>  2 Area1  2006 Male          1883    305893  639.    608.    670.
#>  3 Area1  2007 Female        2036    261799  815.    778.    853.
#>  4 Area1  2007 Male          1997    289411  708.    674.    743.
#>  5 Area1  2008 Female        1782    294098  651.    618.    684.
#>  6 Area1  2008 Male          1461    252190  558.    528.    590.
#>  7 Area1  2009 Female        2354    299239  787.    753.    823.
#>  8 Area1  2009 Male          1752    285455  640.    610.    672.
#>  9 Area1  2010 Female        2008    260319  840.    802.    879.
#> 10 Area1  2010 Male          1842    277287  685.    653.    719.
#> # ℹ 30 more rows

# calculate same specifying standard population in vector form
df_std %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        2122    262014  788.    751.    825. 95%       
#>  2 Area1  2006 Male          1883    305893  639.    608.    670. 95%       
#>  3 Area1  2007 Female        2036    261799  815.    778.    853. 95%       
#>  4 Area1  2007 Male          1997    289411  708.    674.    743. 95%       
#>  5 Area1  2008 Female        1782    294098  651.    618.    684. 95%       
#>  6 Area1  2008 Male          1461    252190  558.    528.    590. 95%       
#>  7 Area1  2009 Female        2354    299239  787.    753.    823. 95%       
#>  8 Area1  2009 Male          1752    285455  640.    610.    672. 95%       
#>  9 Area1  2010 Female        2008    260319  840.    802.    879. 95%       
#> 10 Area1  2010 Male          1842    277287  685.    653.    719. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same dsrs by appending the standard populations to the data frame
df_std %>%
    mutate(refpop = rep(esp2013,40)) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs,pop, stdpop=refpop, stdpoptype="field")
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        2122    262014  788.    751.    825. 95%       
#>  2 Area1  2006 Male          1883    305893  639.    608.    670. 95%       
#>  3 Area1  2007 Female        2036    261799  815.    778.    853. 95%       
#>  4 Area1  2007 Male          1997    289411  708.    674.    743. 95%       
#>  5 Area1  2008 Female        1782    294098  651.    618.    684. 95%       
#>  6 Area1  2008 Male          1461    252190  558.    528.    590. 95%       
#>  7 Area1  2009 Female        2354    299239  787.    753.    823. 95%       
#>  8 Area1  2009 Male          1752    285455  640.    610.    672. 95%       
#>  9 Area1  2010 Female        2008    260319  840.    802.    879. 95%       
#> 10 Area1  2010 Male          1842    277287  685.    653.    719. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
    filter(ageband <= 70) %>%
    group_by(area, year, sex) %>%
    phe_dsr(obs, pop, stdpop = esp2013[1:15])
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1585    205757  779.    740.    819. 95%       
#>  2 Area1  2006 Male          1439    236639  624.    592.    658. 95%       
#>  3 Area1  2007 Female        1743    203479  849.    808.    890. 95%       
#>  4 Area1  2007 Male          1517    223282  697.    661.    735. 95%       
#>  5 Area1  2008 Female        1420    229257  673.    637.    709. 95%       
#>  6 Area1  2008 Male          1071    198871  542.    510.    576. 95%       
#>  7 Area1  2009 Female        1825    231772  792.    755.    831. 95%       
#>  8 Area1  2009 Male          1567    234471  668.    635.    702. 95%       
#>  9 Area1  2010 Female        1673    210904  828.    787.    869. 95%       
#> 10 Area1  2010 Male          1438    219113  667.    632.    703. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>
    
# calculate separate dsrs for persons for each area and year)
df_std %>%
    group_by(area, year, ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last") %>%
    phe_dsr(obs,pop)
#> # A tibble: 20 × 10
#> # Groups:   area, year [20]
#>    area   year total_count total_pop value lowercl uppercl confidence statistic 
#>    <chr> <int>       <int>     <int> <dbl>   <dbl>   <dbl> <chr>      <chr>     
#>  1 Area1  2006        4005    567907  701.    678.    725. 95%        dsr per 1…
#>  2 Area1  2007        4033    551210  761.    736.    787. 95%        dsr per 1…
#>  3 Area1  2008        3243    546288  597.    575.    619. 95%        dsr per 1…
#>  4 Area1  2009        4106    584694  714.    692.    738. 95%        dsr per 1…
#>  5 Area1  2010        3850    537606  754.    729.    779. 95%        dsr per 1…
#>  6 Area2  2006        3636    574690  587.    566.    608. 95%        dsr per 1…
#>  7 Area2  2007        4504    559264  823.    797.    849. 95%        dsr per 1…
#>  8 Area2  2008        3648    554981  658.    635.    681. 95%        dsr per 1…
#>  9 Area2  2009        3819    552990  732.    708.    756. 95%        dsr per 1…
#> 10 Area2  2010        3948    556917  730.    706.    754. 95%        dsr per 1…
#> 11 Area3  2006        3401    602772  584.    563.    605. 95%        dsr per 1…
#> 12 Area3  2007        4240    575772  751.    727.    775. 95%        dsr per 1…
#> 13 Area3  2008        4068    565843  721.    698.    745. 95%        dsr per 1…
#> 14 Area3  2009        3746    549496  712.    688.    737. 95%        dsr per 1…
#> 15 Area3  2010        3176    538633  567.    546.    589. 95%        dsr per 1…
#> 16 Area4  2006        4183    555916  756.    732.    780. 95%        dsr per 1…
#> 17 Area4  2007        3179    552813  594.    572.    616. 95%        dsr per 1…
#> 18 Area4  2008        4032    544503  742.    717.    767. 95%        dsr per 1…
#> 19 Area4  2009        4334    577014  783.    758.    807. 95%        dsr per 1…
#> 20 Area4  2010        3617    572747  653.    631.    676. 95%        dsr per 1…
#> # ℹ 1 more variable: method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: Unlike the phe_dsr function, there is no default standard or reference data for the calculate_ISRatio and calculate_ISRate functions. These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

  • reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.

  • reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple indirectly standardised rates or ratios then the reference data will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   753 115387
#> 2       5   813 110861
#> 3      10   747 137533
#> 4      15   901 107845
#> 5      20   752 127510
#> 6      25   899 119024

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     2122    1753. 1.21    1.16    1.26  95%       
#>  2 Area1  2006 Male       1883    2042. 0.922   0.881   0.965 95%       
#>  3 Area1  2007 Female     2036    1736. 1.17    1.12    1.22  95%       
#>  4 Area1  2007 Male       1997    1960. 1.02    0.975   1.06  95%       
#>  5 Area1  2008 Female     1782    1917. 0.930   0.887   0.974 95%       
#>  6 Area1  2008 Male       1461    1672. 0.874   0.830   0.920 95%       
#>  7 Area1  2009 Female     2354    2008. 1.17    1.13    1.22  95%       
#>  8 Area1  2009 Male       1752    1870. 0.937   0.894   0.982 95%       
#>  9 Area1  2010 Female     2008    1736. 1.16    1.11    1.21  95%       
#> 10 Area1  2010 Male       1842    1842. 1.00    0.955   1.05  95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype="field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2122    1753. 1.21    1.16    1.26 
#>  2 Area1  2006 Male       1883    2042. 0.922   0.881   0.965
#>  3 Area1  2007 Female     2036    1736. 1.17    1.12    1.22 
#>  4 Area1  2007 Male       1997    1960. 1.02    0.975   1.06 
#>  5 Area1  2008 Female     1782    1917. 0.930   0.887   0.974
#>  6 Area1  2008 Male       1461    1672. 0.874   0.830   0.920
#>  7 Area1  2009 Female     2354    2008. 1.17    1.13    1.22 
#>  8 Area1  2009 Male       1752    1870. 0.937   0.894   0.982
#>  9 Area1  2010 Female     2008    1736. 1.16    1.11    1.21 
#> 10 Area1  2010 Male       1842    1842. 1.00    0.955   1.05 
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     2122    1753.     662.  801.    767.    836. 95%       
#>  2 Area1  2006 Male      1883    2042.     662.  610.    583.    638. 95%       
#>  3 Area1  2007 Fema…     2036    1736.     662.  776.    743.    810. 95%       
#>  4 Area1  2007 Male      1997    1960.     662.  674.    645.    704. 95%       
#>  5 Area1  2008 Fema…     1782    1917.     662.  615.    587.    644. 95%       
#>  6 Area1  2008 Male      1461    1672.     662.  578.    549.    609. 95%       
#>  7 Area1  2009 Fema…     2354    2008.     662.  775.    744.    807. 95%       
#>  8 Area1  2009 Male      1752    1870.     662.  620.    591.    650. 95%       
#>  9 Area1  2010 Fema…     2008    1736.     662.  765.    732.    800. 95%       
#> 10 Area1  2010 Male      1842    1842.     662.  662.    632.    692. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype="field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     2122    1753.     662.  801.    767.    836.
#>  2 Area1  2006 Male       1883    2042.     662.  610.    583.    638.
#>  3 Area1  2007 Female     2036    1736.     662.  776.    743.    810.
#>  4 Area1  2007 Male       1997    1960.     662.  674.    645.    704.
#>  5 Area1  2008 Female     1782    1917.     662.  615.    587.    644.
#>  6 Area1  2008 Male       1461    1672.     662.  578.    549.    609.
#>  7 Area1  2009 Female     2354    2008.     662.  775.    744.    807.
#>  8 Area1  2009 Male       1752    1870.     662.  620.    591.    650.
#>  9 Area1  2010 Female     2008    1736.     662.  765.    732.    800.
#> 10 Area1  2010 Male       1842    1842.     662.  662.    632.    692.
#> # ℹ 30 more rows