Lab 7: Poisson Regression and Effect Modification

Chris Slaughter

Introduction

In this lab, we will estimate the association between salary and gender and consider administrative responsibilities a potential effect modifier. To provide a Poisson Regresion review, we will consider salary to be a rate (dollars per month) and use the following Poisson regression model for the log of the salary rate (\(\lambda_i\)). We will use robust standard error estimates

\(\textrm{log}(\lambda_i) = \beta_0 + \beta_M * \textrm{Male} + \beta_A * \textrm{Admin} + \beta_{MA} * \textrm{Male} * \textrm{Admin}\)

Setup

Load packages

Loading required package: Hmisc

Attaching package: 'Hmisc'
The following objects are masked from 'package:base':

    format.pval, units
Loading required package: survival
Loading required package: MASS

Attaching package: 'biostat3'
The following object is masked from 'package:survival':

    colon
Loading required package: carData

Attaching package: 'car'
The following objects are masked from 'package:rms':

    Predict, vif

Attaching package: 'dplyr'
The following object is masked from 'package:car':

    recode
The following object is masked from 'package:MASS':

    select
The following objects are masked from 'package:Hmisc':

    src, summarize
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Part 1: Initial dataset manipulation

1. Read in the salary dataset. Remove all observations that are not from 1995. Create an indicator variable for male gender

# Initial dataset manipulations

data <- stata.get("http://biostat.app.vumc.org/wiki/pub/Main/CourseBios312/salary.dta")
data95 <- subset(data, year=="95")
data95$male <- as.numeric(data95$sex=="M")

Part 2: Poisson Model

2.1 In the above model, we did not specify an offset. Why was an offset not used needed in this situation? Give an example of where it might be appropriate to include an offset for the amount of money earned.

2.2 An underlying assumption of the Poisson model is that the mean equals the varaince. Consider the following summary statistics of the mean and the variance of salary overall and by admin duties and sex.

data95 %>% summarize(mean = mean(salary, na.rm = TRUE),
                     var = var(salary, na.rm=TRUE))
      mean     var
1 6389.808 4148443
data95 %>% group_by(sex, admin) %>%
  summarize(mean = mean(salary, na.rm = TRUE),
                     var = var(salary, na.rm=TRUE))
`summarise()` has grouped output by 'sex'. You can override using the `.groups`
argument.
# A tibble: 4 × 4
# Groups:   sex [2]
  sex   admin  mean      var
  <chr> <int> <dbl>    <dbl>
1 F         0 5280. 1988395.
2 F         1 6770. 2647747.
3 M         0 6507. 4010738.
4 M         1 8458. 3756893.

2.2.1 For evaluating the model assumption that the mean equals the variance, which set of output (the overall or stratified by admin and sex) corresponds to the effect modification model specified in the introduction? Is the assumption that mean equals the variance an assumption on the (marginal) distribution of salary or the conditional distribution of salary given covariates?

2.2.2 Note that the stratum specific variance estimates are smaller than the overall variance because admin and male are significant predictors of salary. It is likely that we have not included important covariates from the model (e.g. years of experience, rank, field). Speculate on how inclusion of additional predictors could impact the conditional variance, and the mean-variance relationship assumption of the Poisson model.

2.2.3 Why should we use robust standard error estimates? Should we base our decision to use robust standard error estimate on exploratory analyses like what is given above?

2.2.4 Use the estimates of the stratum-specific means to find the estimate \(\beta_0\), \(\beta_M\), \(\beta_A\), and \(\beta_{MA}\) that will be obtained from the Poisson regression model.

Part 3: Effect modification model parameter interpretation

3.1 Fit the given model and use it to answer the following questions

For each of the following scientific questions, specify the null and alternative hypothesis need to answer the question based on this model. Then, conduct the appropriate test and interpret the results.

3.2 Gender effects

3.2.1 Is there a difference in mean salary, males compared to females, among faculty without administrative duties?

3.2.2 Is there a difference in mean salary, males compared to females, among faculty with administrative duties?

3.2.3 Is there any difference in mean salary, males compared to females (that is, in faculty with or without administrative duties)?

3.3 Administrative effects

3.3.1 Is there a difference in mean salary, comparing faculty with admin duties to those without admin duties, among female faculty?

3.3.2 Is there a difference in mean salary, comparing faculty with admin duties to those without admin duties, among male faculty?

3.3.3 Is there a any difference in mean salary, those with admin duties compared to those without admin duties (that is, in male or female faculty)?

3.4 Effect Modification

3.4.1 Is there evidence that administrative duties modifies the difference in salary, males compared to females?

3.4.2 Is there evidence that gender modifies the difference in salary comparing faculty with administrative duties to faculty without administrative duties?

Part 4: Comparison to other Poisson models

4.1 Impact of not including the effect modifier, \(\beta_{MA}\) in the model. Suppose we had fit the following model instead. What can we say about our estimates of \(\beta_M\) and \(\beta_A\) from this model? Fit the model to see if you are right.

\(\textrm{log}(\lambda_i) = \beta_0 + \beta_M * \textrm{Male} + \beta_A * \textrm{Admin}\)

4.2 Impact of not including the effect modifier, \(\beta_{MA}\) or the main effect \(\beta_A\) in the model. Suppose we had fit the following model instead.

\(\textrm{log}(\lambda_i) = \beta_0 + \beta_M * \textrm{Male}\)

4.2.1 If Admin duties is a confounder, what can we say about our estimate of \(\beta_M\) from this model?

4.22 If Admin duties is not a confounder, what can we say about our estimate of \(\beta_M\) from this model?