ANOVA and Model Choice

Lecture 12

Chris Slaughter

October 30, 2025

Code
library(rms)
library(ggplot2)
library(splines)
library(lspline)
library(lmtest)
library(sandwich)
library(car)
library(kableExtra)
library(plotrix)
tryCatch(source('pander_registry.R'), error = function(e) invisible(e))

1 Overview

  • There are many different ways we can model predictors

  • Consider what alternative models will make scientific sense

  • What is the impact of letting the data drive the selection of a model

  • I am going to discuss in terms of a clinical trial where we have replicates at dose levels

    • We can find dose-specific means and compare modeling approaches

2 ANOVA versus Linear Continuous Models

  • Compare power of linear continuous models versus ANOVA as a function

    • of trend in means AND

    • standard errors withing groups

  • ANOVA (dummy variables)

    • Uses indicator variables for every dose (group) level

      • Again, I am thinking about “dose” in a general sense that could include covariates like age, cholesterol, blood pressure, etc.

      • Traditionally, dose would just be dose of some treatment

    • Fits group means exactly (saturated model)

      • One way ANOVA: One categorical predictor

      • Two way ANOVA: Two categorical predictors

        • Fit with the interactions to get group means exactly
    • Saturated models do not mix random error with systematic error

      • Systematic error: Error due to differences from sample means from predicted means

      • Random error: Error that cannot be explained after controlling for dose

    • ANOVA ignores the ordering of the groups, so it gains no power from trends

      • e.g. does not assume that the difference between dose=15 and dose=30 group is similar to the difference between the dose=30 and dose=45 groups

      • In fact, the same level of significance is gained no matter what permutation of dose groups is used

  • Linear continuous models

    • Borrows information across groups

      • Accurate and efficient if the model is correct
    • If model is incorrect, mixes random and systematic error

      • Will have some systematic error because the means are not predicted exactly
    • Can gain power from ordering of groups in order to detect a trend

      • But, no matter how low the standard error is, if there is no trend in the mean, there is no statistical significance
Code
n <- 10
dose.levels <- c(0,20,40,60,80)
dose <- rep(dose.levels, n)
set.seed(37)

y1 <- 300 + 5*dose + rnorm(5*n, 0, 200)
y2 <- 300 + 200*(dose==20) + 100*(dose==40) - 100*(dose==60) + 0*(dose==80) + rnorm(5*n, 0, 200)

m1 <- lm(y1 ~ dose)
m2 <- lm(y1 ~ factor(dose))

#m3 <- lm(y2 ~ dose)
m4 <- lm(y2 ~ factor(dose))

p1 <- predict(m1, newdata=data.frame(dose=dose.levels), se.fit=TRUE)
p2 <- predict(m2, newdata=data.frame(dose=dose.levels), se.fit=TRUE)

#p3 <- predict(m3, newdata=data.frame(dose=dose.levels), se.fit=TRUE)
p4 <- predict(m4, newdata=data.frame(dose=dose.levels), se.fit=TRUE)
Code
par(mfrow=c(2,2))
plotCI(x=dose.levels, y=p1$fit, p1$se.fit, ylab="Response", xlab="Dose", ylim=c(0,1000), main="Linear: High power; ANOVA: High Power")
plotCI(x=dose.levels, y=p1$fit, 3*p1$se.fit, ylab="Response", xlab="Dose", ylim=c(0,1000), main="Linear: Mod power; ANOVA: Low Power")

plotCI(x=dose.levels[c(1,5,4,2,3)], y=p1$fit, p1$se.fit, ylab="Response", xlab="Dose", ylim=c(0,1000), main="Linear: No power; ANOVA: High Power")
plotCI(x=dose.levels[c(1,5,4,2,3)], y=p1$fit, 3*p1$se.fit, ylab="Response", xlab="Dose", ylim=c(0,1000), main="Linear: No power; ANOVA: Low Power")

Comparsion of power using ANOVA versus linear dose for detecting a dose effect for various dose-response relationships
  • Other options for modeling continuous predictors

    • Combinations of linear trends and indicator variables

    • Splines

    • Polynomials

    • etc.

3 Choice of Transformation

  • The exact form used to model predictors should be based on scientific (first) and statistical (second) criteria

  • Scientific issues

    • The form used to model predictors must address the specific scientific question

      • Should be the next logical step in the process of investigating the overall goal

      • First, establish some sort of an association

      • Second, detect a first order trend

      • Third, detecting specific forms of non-linearities

        • Threshold effects?

        • U- or S-shaped trends?

      • Finally, more complex models

    • When the scientific question relates to prediction, it is imperative that the regression model accurately reflects the true relationship between predictors and the summary measure of response

      • Failure to have the correct model will guarantee that some groups may not have the correct predicted response
    • When the scientific question relates to detection of associations, the importance of having the true model depends on the statistical role of the predictor

      • With the predictor of interest, the most important issues is to protect the validity of the statistical inference

        • Data driven decision will inflate the type I error rate
      • With precision variables, it is not as crucial that the true relationship be modeled

        • An approximate model will provide most of the precision gains
      • With confounders, failure to accurately model the relationship between the confounder and the response may lead to residual confounding

        • Sometimes we will use very flexible models for continuous confounders as there is little cost to doing so and the potential for imporant gain
    • As the goal of any analysis is to communicate findings to the greater scientific community, it is also important that modeling of predictors is easy to understand

      • This is an issue that matters most for your predictor of interest

      • We are generally not worried about making inference about precision variables or confounders

  • Statistical issues

    • The greatest statistical precision will be gained when the model reflects the true relationship between the predictor and the response

      • Accurate modeling of the relationship will avoid introducing systematic error in the estimates of the standard errors

      • Parsimony: Using the fewest parameters to model the relationship will allow greater precision

      • Precision is a trade-off between parsimony and increased accuracy from including more parameters

    • We should select the form of modeling the predictor before looking at the data

      • Data drive selection of transformations will tend to lead to inaccurate (anti-conservative) statistical inference

      • Overfitting of the data leads to spuriously low estimates of the within group variability

        • Thus standard errors estimates are too low

        • Type-I errors are inflated

        • Confidence interval are too narrow (inaccurate coverage probabilities)

      • Data-driven model selection will also lead to coefficient estimates that are biased away from the null (leading you to overstate your scientific effects)

4 Example: Beta Carotene Supplements

4.1 Overview

  • Before doing large scale clinical trials, it is important to understand the pharmacokinetics of a drug

  • Phase II prevention trials often administer a drug in various doses to volunteers, and pertinent plasma levels are then measured at regular intervals

  • Of particular interest is how dose level affects the build up of drug in the plasma over time, as well as how the dose level might affect other blood chemistry

  • Forty-six (46) volunteers were randomly assigned to receive one of five doses of beta-carotene (0, 15, 30, 45, or 60 mg/day) for 9 months in a double blind fashion

  • The specific aim was to determine how different dose levels affected the serum beta-carotene levels after 9 months

  • Other measured variables available in this data set include subject age, sex, weight, body mass index, percent body fat, and serum cholesterol level at baseline

Code
carot <- stata.get("data/carot.dta")
res1 <- aggregate(carot3 ~ dose, data = carot,
FUN = function(x) c(n = length(x), mean = mean(x), sd = sd(x),
min = min(x), q25 = quantile(x, 0.25),
median = median(x), q75 = quantile(x, 0.75),
max = max(x)))
data.frame(dose = res1$dose, res1$carot3, check.names = FALSE)
dose n mean sd min q25.25% median q75.75% max
0 7 186.32 87.798 84.50 133.00 149.0 240.88 323.0
15 8 1253.58 570.467 576.75 722.69 1250.0 1682.44 2018.8
30 9 1504.61 479.026 849.33 1157.33 1498.5 1840.00 2248.5
45 7 1749.08 579.049 950.25 1333.00 1848.2 2234.33 2310.4
60 9 1877.63 429.880 1233.33 1724.67 1865.0 1917.67 2855.0
  • In this randomized trial, we can consider several potential response variables

    • Plasma level at the end of treatment

    • Change in plasma level over the treatment period

    • Either of the above adjusted for baseline plasma (ANCOVA model)

  • Accounting for baseline

    • Dose group \(i\), subject \(j\), time \(t\)

    • \(Y_{ijt} \sim (\mu_{it}, \sigma^2)\); \(\textrm{corr}(Y_{ij0}, Y_{ij9}) = \rho\) \[\begin{aligned} \overline{Y}_{i\cdot9} & \sim & \left(\mu_{i9}, \sigma^2/n \right) \\ \overline{Y}_{i\cdot9} - \overline{Y}_{i\cdot0} & \sim & \left(\mu_{i9} - \mu_{i0}, 2\sigma^2(1-\rho)/n \right) \\ \overline{Y}_{i\cdot9} - \rho \overline{Y}_{i\cdot0} & \sim & \left(\mu_{i9} - \rho \mu_{i0}, \sigma^2(1-\rho^2)/n \right)\end{aligned}\]

    • Compared variances of the above three equations

      • When are the variances equal, smaller, larger

      • Which is always smallest

  • By randomization, there will be equal means at baseline

    • \(\mu_{T,0} = \mu_{P,0}\) where \(T\) is any of the treatment doses and \(P\) is placebo
  • Contrast across dose groups \[\begin{aligned} \overline{Y}_{T,\cdot9} - \overline{Y}_{P,\cdot9} & \sim & \left(\mu_{T,9} - \mu_{P,9}, 2\sigma^2/n \right) \\ \left(\overline{Y}_{T,\cdot9} - \overline{Y}_{T, \cdot0}\right) - \left(\overline{Y}_{P,\cdot9} - \overline{Y}_{P, \cdot0}\right) & \sim & \left(\mu_{T,9} - \mu_{P,9}, 4\sigma^2(1-\rho)/n \right) \\ \left(\overline{Y}_{T,\cdot9} - \rho \overline{Y}_{T,\cdot0}\right) - \left(\overline{Y}_{P,\cdot9} - \rho \overline{Y}_{P,\cdot0}\right) & \sim & \left(\mu_{T,9} - \mu_{P,9}, 2\sigma^2(1-\rho^2)/n \right)\end{aligned}\]

  • Simple linear regression

    • Regress \(Y\) on \(X\)

      • \(Y_i \sim \left(\mu_Y, \sigma^2_Y\right)\); \(X_i \sim \left(\mu_X, \sigma^2_X\right)\)

      • \(\textrm{corr}\left(Y_i, X_i \right) = \rho\)

    • Regression model: \(E[Y_i | X_i] = \beta_0 + \beta_1 X_i\)

      • \(\beta_0 = \mu_Y - \beta_1 \mu_x\)

      • \(\beta_1 = \rho \frac{\sigma_Y}{\sigma_X}\)

  • Analysis of Covariance

    • Dose group \(i\), subject \(j\), time \(t\)

    • \(Y_{ijt} \sim (\mu_{it}, \sigma^2)\); \(\textrm{corr}(Y_{ij0}, Y_{ij9}) = \rho\)

    • Regression model: \(E[Y_{ij9} | Y_{i_j0}] = \beta_0 + \beta_1 Y_{ij0}\)

      • \(\beta_1 = \rho\)

4.2 Methods for modeling dose response

  • In a randomized clinical trial, we will tend to have the greatest precision if we adjust for baseline as a predictor in a linear regression model

  • A wide variety of models may be considered for examining the relationship between dose and plasma levels

    • Dummy variables where we model each dose level independently, without borrowing information across groups (ANOVA)

    • Linear continuous predictors (transformed or untransformed)

    • Dichotomization (at any of several thresholds)

    • Polynomials, splines, other flexible methods

    • Combinations of the above

    • Even more complex models

  • I will compare possible models

    • Graphically: Show data and fitted values without adjustment for baseline

    • Numerically: Show regression estimates and tests after adjustment for baseline

    • Note that this is an academic exercise and not something you would do in practice to come up with the “best” model

  • Predicted values

    • After computing a regression command, Stata will provide predicted values for each case

      • Mathematically, this is just the intercept plus the regression parameters multiplied by the covariates for each case

      • Stata command: predict varname

4.3 ANOVA analysis

  • Fits each group independently

  • Does not use the ordering of the dose groups when looking for an effect

    • Completely ignores the magnitude and ordering of the \(x\)-axis
  • A priori, we might expect this is not the most efficient method if the alternative hypothesis is true

    • We expect larger plasma levels with increasing dose

    • We will thus have less power to detect a first-order trend

Code
carot$dose.factor <- factor(carot$dose)
m.factor <- lm(carot3 ~ dose.factor + carot0, data=carot)
coeftest(m.factor, vcov=sandwich)
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -361.4516 154.46722 -2.3400 0.02529265
dose.factor15 1224.1896 196.89130 6.2176 0.00000045
dose.factor30 1439.8373 143.63572 10.0242 0.00000000
dose.factor45 1678.9839 154.10485 10.8951 0.00000000
dose.factor60 1791.0090 141.00926 12.7014 0.00000000
carot0 1.9028 0.49509 3.8433 0.00050617
Code
plot(jitter(carot$dose, factor=.2), carot$carot3, ylab="Plasma Beta Carotene", xlab="Dose of Supplementation", main="Model: Dummy Variables (ANOVA)")
m1 <- lm(carot3~factor(dose), data=carot)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="X", cex=2)
lines(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))))
legend("topleft", c("Fitted", "Group Means"), pch=c("X","O"), lty=c(NA,1), bty="n")

  • Testing for the dose effect

    • We must use the testparm command (or test) because the model includes the baseline measurement

    • testparm is similar to test, but allows testing multiple parameters using wildcards

Code
linearHypothesis(m.factor, c("dose.factor15",
                       "dose.factor30",
                       "dose.factor45",
                       "dose.factor60"),
                 vcov=sandwich(m.factor))
Linear hypothesis test
Res.Df Df F Pr(>F)
38
34 4 69.959 0
  • We would have had the same fitted values (and thus inference) if we had decided to drop a different dose group

    • Example: Making my own dummy variables for dose, with dose at 60 being the reference group
Code
carot$dose.new <- relevel(carot$dose.factor, ref="60")

m2.factor <- lm(carot3 ~ dose.new + carot0, data=carot)
coeftest(m2.factor, vcov=sandwich)
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 1429.5573 163.01893 8.76927 3.0000e-10
dose.new0 -1791.0090 141.00926 -12.70136 0.0000e+00
dose.new15 -566.8194 221.47493 -2.55929 1.5108e-02
dose.new30 -351.1717 173.86127 -2.01984 5.1337e-02
dose.new45 -112.0251 188.55038 -0.59414 5.5635e-01
carot0 1.9028 0.49509 3.84332 5.0617e-04
Code
linearHypothesis(m2.factor, c("dose.new0",
                       "dose.new15",
                       "dose.new30",
                       "dose.new45"),
                 vcov=sandwich(m2.factor))
Linear hypothesis test
Res.Df Df F Pr(>F)
38
34 4 69.959 0
  • Note that the parameter estimates all will lead to the same fitted values

    • e.g. Intercept in above model (1430) equals the intercept + dose60 coefficient (-361 + 1791) in previous model
  • Overall F statistics, R-squared, Root MSE all the same

  • Partial t-tests tend to differ as we are making comparisons to different reference groups

  • Could also fit the same model with no intercept

    • Would then have to include all five dose groups

    • We can get Stata to include fit all five dose groups and no intercept using the noconstant option

    • In R, fit a model without an intercept by adding a \(-1\) in the model equation (e.g. \(y \sim -1 + x\))

    • Not including the intercept changes the overall F statistic and the R-squared measures

Code
m3.factor <- lm(carot3 ~ -1 + dose.factor + carot0, data=carot)
coeftest(m3.factor, vcov=sandwich)
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
dose.factor0 -361.4516 154.46722 -2.3400 2.5293e-02
dose.factor15 862.7379 223.06284 3.8677 4.7235e-04
dose.factor30 1078.3857 165.01387 6.5351 1.7530e-07
dose.factor45 1317.5322 205.85752 6.4002 2.6150e-07
dose.factor60 1429.5573 163.01893 8.7693 3.0000e-10
carot0 1.9028 0.49509 3.8433 5.0617e-04
  • Correspondence of the no-intercept model compared to previous models

    • Some textbooks refer to this as a “cell means” coding system

      • If we didn’t have baseline beta carotene in the model, the dose parameters would correspond directly to the means in each dose group

      • With baseline beta carotene in the model, the dose parameters are the means when carot0 is 0

    • In terms of model fit, the model is the same as before

      • No intercept means each dose group is compared to a mean of 0
    • Fitted values will be the same

    • Test of dose effect will need to test equality of all five dose covariates

      • This is not a test that these 5 parameters are 0

      • \(H_0: dose0 = dose15 = dose30 = dose45 = dose60\)

      • \(H_1:\) at least one of the above is not equal

Code
linearHypothesis(m3.factor, c("dose.factor0=dose.factor15",
                              "dose.factor0=dose.factor30",
                              "dose.factor0=dose.factor45",
                              "dose.factor0=dose.factor60"),
                 vcov=sandwich(m3.factor))
Linear hypothesis test
Res.Df Df F Pr(>F)
38
34 4 69.959 0

4.4 Binary dose: Placebo versus Active

  • Dichotomize into dose 0 versus dose \(>\) 0

    • Will be an accurate model if all (or virtually all) of the effect is attained at the lowest dose level

    • Often used when little is know about a treatment, or when dose is difficult to quantify

      • e.g. Smoking

      • We are relatively certain of a smoking effect, so our major scientific interest is likely related to the dose-response relationship above the lowest dose

Code
carot$trt <- (carot$dose>0)+0

m.trt <- lm(carot3 ~ trt + carot0, data=carot)
coeftest(m.trt, vcov=sandwich(m.trt))
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -406.6816 207.08860 -1.9638 0.0570999
trt 1544.2205 115.44469 13.3763 0.0000000
carot0 2.0599 0.68203 3.0203 0.0045596
Code
m2 <- lm(carot3 ~ dose>0, data=carot)
plot(jitter(carot$dose, factor=.2), carot$carot3, ylab="Plasma Beta Carotene", xlab="Dose of Supplementation", main="Model: Dichotomous Dose")
lines(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m2, newdata=data.frame(dose=seq(0,60, by=15))), pch="X", cex=2)
legend("topleft", c("Fitted", "Group Means"), pch=c("X","O"), lty=c(NA,1), bty="n")

4.5 Linear, continuous dose

  • Estimates the best fitting straight line to response

    • Accurate if the response is linear
  • Often used when little is know about the treatment and a general trend is expected

    • In this particular application, we are relatively certain of an effect, so our major interest is in modeling the dose response relationship above 0.
Code
m.cont <- lm(carot3 ~ dose + carot0, data=carot)
coeftest(m.cont, vcov=sandwich(m.cont))
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 245.0003 214.29566 1.1433 2.6027e-01
dose 25.4628 3.49367 7.2883 1.1700e-08
carot0 1.3341 0.63189 2.1113 4.1565e-02
  • To test the treatment effect, could either use the test command for dose or use the output directly as we are only testing one parameter
Code
m3 <- lm(carot3 ~ dose, data=carot)
plot(jitter(carot$dose, factor=.2), carot$carot3, ylab="Plasma Beta Carotene", xlab="Dose of Supplementation", main="Model: Linear Dose")
lines(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m3, newdata=data.frame(dose=seq(0,60, by=15))), pch="X", cex=2)
legend("topleft", c("Fitted", "Group Means"), pch=c("X","O"), lty=c(NA,1), bty="n")

4.6 Polynomial models of dose

  • Fit terms involving dose, dose squared

    • Often used to fit U-shaped trends

    • In general, a quadratic is a pretty strong assumption in that it assumes constant curvature over dose

Code
m.poly <- lm(carot3 ~ poly(dose,2) + carot0, data=carot)
coeftest(m.poly, vcov=sandwich(m.poly))
t test of coefficients
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 935.2470 157.31460 5.9451 8.2290e-07
poly(dose, 2)1 3652.1148 377.96287 9.6626 0.0000e+00
poly(dose, 2)2 -1716.6346 351.59782 -4.8824 2.1511e-05
carot0 1.7281 0.53494 3.2304 2.6414e-03
Code
m4 <- lm(carot3 ~ poly(dose, 2), data=carot)
plot(jitter(carot$dose, factor=.2), carot$carot3, ylab="Plasma Beta Carotene", xlab="Dose of Supplementation", main="Model: Quadratic Dose")
lines(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m4, newdata=data.frame(dose=seq(0,60, by=15))), pch="X", cex=2)
legend("topleft", c("Fitted", "Group Means"), pch=c("X","O"), lty=c(NA,1), bty="n")

  • The partial t-test for dosesqr can be interpreted as a test for linear dose response

    • It is highly significant, suggestion departure from linearity
  • To test the treatment effect, we need to test the two dose covariates

Code
linearHypothesis(m.poly, c("poly(dose, 2)1",
                           "poly(dose, 2)2"),
                 vcov=sandwich(m.poly))
Linear hypothesis test
Res.Df Df F Pr(>F)
38
36 2 93.951 0

4.7 Highest order polynomial models versus ANOVA

  • With 5 discrete dose levels, a 4th degree polynomial will fit the means exactly

  • Thus, the model will have the same fit as the ANOVA model using dummy variables for each levels of dose

    • Higher order polynomials are borrowing less information across dose groups

    • Highest order polynomial borrows no information across dose groups

. gen dosecub = dose^3
. gen dosequad = dose^4
. regress carot3 dose dosesqr dosecub dosequad carot0, robust

Linear regression                                      Number of obs =      40
                                                       F(  5,    34) =   47.68
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.7184
                                                       Root MSE      =  417.46

------------------------------------------------------------------------------
             |               Robust
      carot3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        dose |    157.876   61.94333     2.55   0.015     31.99197    283.7599
     dosesqr |  -6.943752   5.066692    -1.37   0.180    -17.24051    3.353004
     dosecub |   .1385695   .1313523     1.05   0.299    -.1283706    .4055096
    dosequad |  -.0009734   .0010718    -0.91   0.370    -.0031515    .0012047
      carot0 |   1.902792   .5370015     3.54   0.001     .8114738     2.99411
       _cons |  -361.4516   167.5432    -2.16   0.038    -701.9404   -20.96284
------------------------------------------------------------------------------


. xi: regress carot3 i.dose carot0, robust
i.dose            _Idose_0-60         (naturally coded; _Idose_0 omitted)

Linear regression                                      Number of obs =      40
                                                       F(  5,    34) =   47.68
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.7184
                                                       Root MSE      =  417.46

------------------------------------------------------------------------------
             |               Robust
      carot3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   _Idose_15 |    1224.19   213.5586     5.73   0.000     790.1863    1658.193
   _Idose_30 |   1439.837   155.7948     9.24   0.000     1123.224     1756.45
   _Idose_45 |   1678.984   167.1502    10.04   0.000     1339.294    2018.674
   _Idose_60 |   1791.009    152.946    11.71   0.000     1480.185    2101.833
      carot0 |   1.902792   .5370015     3.54   0.001     .8114738     2.99411
       _cons |  -361.4516   167.5432    -2.16   0.038    -701.9404   -20.96284
------------------------------------------------------------------------------

4.8 Threshold at 0 and Linear Term

  • Threshold at 0 and linear dose

  • To fit, use a dummy variable for dose0 plus dose (continuous)

    • Fits dose 0 by its group mean

    • Fits dose \(>\) 0 by a line (an intercept and slope)

    • Allows us to address two scientific questions

      • Is there any effect of dose? (test both slopes)

      • Is there any additional benefit beyond the lowest dose? (test linear term’s slope)

. regress carot3 trt dose carot0, robust

Linear regression                                      Number of obs =      40
                                                       F(  3,    36) =   81.26
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.7170
                                                       Root MSE      =  406.69

------------------------------------------------------------------------------
             |               Robust
      carot3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         trt |   1050.836    223.418     4.70   0.000     597.7237    1503.949
        dose |   12.81144   4.794314     2.67   0.011     3.088122    22.53476
      carot0 |   1.904584   .5164903     3.69   0.001     .8570932    2.952075
       _cons |  -361.9675   161.4254    -2.24   0.031    -689.3534   -34.58168
------------------------------------------------------------------------------
Code
m5 <- lm(carot3 ~ (dose >0) + dose, data=carot)
plot(jitter(carot$dose, factor=.2), carot$carot3, ylab="Plasma Beta Carotene", xlab="Dose of Supplementation", main="Model: Threshold and Linear Dose")
lines(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m1, newdata=data.frame(dose=seq(0,60, by=15))), pch="O", cex=2)
points(seq(0,60, by=15), predict(m5, newdata=data.frame(dose=seq(0,60, by=15))), pch="X", cex=2)
legend("topleft", c("Fitted", "Group Means"), pch=c("X","O"), lty=c(NA,1), bty="n")

  • Testing the effect of treatment

    • Two variables model dose, so we need to test both

    • If response increases from dose 0 to lowest dose OR

    • … response increases as dose increase, THEN

    • … we will declare an effect of treatment

  • The partial t-test for the trt term can be used to test for linear dose response

    • Here, it is highly significantly different from 0, indicating that just a linear model is not adequate
  • The partial t-test for the dose term can be interpreted as a test for any added effect above the lowest dose

    • It is significantly different from 0 (\(p = 0.011\))

    • There is a multiple comparison issue here, but many people are comfortable doing this ‘step down’ test after they have already tested from any treatment effect

. test trt dose

 ( 1)  trt = 0
 ( 2)  dose = 0

       F(  2,    36) =  121.88
            Prob > F =    0.0000

5 Data driven model selection

  • Suppose we look at a scatterplot before deciding which model we fit and choose a model that can fit the data well

    • If the data looks like a straight line, choose the model linear in dose

    • If the data looks like a U, choose a quadratic

    • If the data is a complicated pattern of differences among groups, we might choose dummy variables or splines

    • etc.

  • This approach would tend to mimic the behavior of fitting several different models and choosing the model with the lowest \(p\)-value

    • When our eye sees some trend in the data, we would be most likely to pick the model giving the lowest \(p\)-value

5.1 Simulation

  • Using the 46 subjects in this dataset, I can randomly permute the dose they received

    • Effectively, randomize subjects to a different dose

    • But, keep their 9-month and baseline beta carotene levels the same (not permuted)

      • Should remove any association between dose and beta carotene
  • Next, fit each of the five models (linear, quadratic, ANOVA, dichotomized, and dichotomized plus linear)

  • Repeat the process 1000 times (representing 1000 studies)

    • Calculate how often each model rejects the null hypothesis of a dose effect
Code
set.seed(80)

reps <- 1000
n <- length(carot$carot3)
p.vals <- matrix(NA, nrow=reps, ncol=5)

y <- carot$carot3
y0 <- carot$carot0
m.restrict <- lm(y ~ y0)

for(i in 1:reps) {
 perm <- sample(1:n)

#Permute the dose, but not the outcome and baseline
 d <- carot$dose[perm]

 m.anova <- lm(y ~ y0 + factor(d))
 m.linear <- lm(y ~ y0 + d)
 m.quad <- lm(y ~ y0 + poly(d,2))
 m.dichot <- lm(y ~ y0 + (d>0))
 m.dilin <- lm(y ~ y0 + d + (d>0))

 p.vals[i,1:5] <- c(anova(m.anova)[2,"Pr(>F)"], anova(m.linear)[2,"Pr(>F)"],  anova(m.quad)[2,"Pr(>F)"],  anova(m.dichot)[2,"Pr(>F)"], anova(m.restrict, m.dilin)[2,"Pr(>F)"])
}

# Proportion significant by model
apply(p.vals<0.05, 2, mean)

1.96*sqrt(.05*(1-.05)/reps)


table(apply(p.vals<.05,1, sum)) / reps
# Propotion where at least 1 was significant
p.hat <- sum(apply(p.vals<.05,1, sum) >= 1) / reps

p.hat
p.hat + c(-1.96,1.96)*sqrt(p.hat*(1-p.hat)/reps)
  • Individual Model Results

    • Empirical type I error for each method of analysis individually
    Model Emp. Type-I error
    ANOVA 0.049
    Linear 0.046
    Quadratic 0.046
    Dichotomized 0.050
    Dichot + linear 0.041
  • Multiple comparison issues

    • With 5 hypothesis tests at a nominal 0.05 level, experiment-wise error rate is at most \(0.25\) (\(0.05 \times 5\))

    • Worst-case assumes that all tests are mutually exclusive

      • e.g. If the linear dose-response model is significant, no other model is more likely to be significant

      • In fact, the tests will be correlated

    • How many of the 1000 simulated trials had at least on model with a \(p\)-value \(< 0.05\)?

      • From the simulation, I found this to be \(122\) or \(12.2\%\)

      • Note that there is error in this estimate (due to the simulation randomness)

        • 95% CI: [\(10.2\%, 14.2\%\)]
  • General statistical issues

    • The true type 1 error rate for such data driven analyses will depend on several factors

      • The number of tests performed

      • The models considered

        • Similar models will tend to reject the null hypotheses on the same dataset
      • The distribution of the data

        • In particular, heavy tailed distributions decreases the concordance between the tests
  • When you have multiple models you are considering, the conclusions are less strong

    • The p-values (or other metrics) can still be useful in ordering the associations

    • Among all of the models considered, it appears as if SNP X is the most strongly associated with CVD

      • Would be useful to put a CI around this ranking as well

5.2 Post hoc adjustments for multiple comparisons

  • In frequentist reasoning, we try to ensure that our error rate is held at some level \(\alpha\)

    • When only considering one decision, this is relatively easy

    • When making multiple decisions, we must consider the experiment-wise error rate

  • In the worst case scenario, an error rate of \(\alpha\) on each decision could lead to an experiment-wise error rate that is as high as \(k \times \alpha\)

    • Such would be the case if all of our errors were mutually exclusive
  • If all error were independent of each other, then the experiment-wise error rate is

    • \(1 - (1 - \alpha)^k\)
  • Experiment-wise error rates (\(\alpha = 0.05\) at each decision)

    Number of Worst Case Independent
    Comparisons Scenario Errors
    1 0.0500 0.0500
    2 0.1000 0.0975
    3 0.1500 0.1426
    5 0.2500 0.2262
    10 0.5000 0.4013
    20 1.0000 0.6415
    50 1.0000 0.9231
  • When making multiple comparison which all tend address the same scientific question, we may adjust our level of significance to protect the experiment-wise error rate

    • The problem with this approach is does not adjust for any bias in parameter estimates
  • Bonferroni Correction

    • Assumes the worst case scenario

    • When making \(k\) comparisons, either

      • Tests individual \(p\)-values against \(\frac{\alpha}{k}\)

      • Multiply \(p\)-values by \(k\) and compare to \(\alpha\) (keeping the \(p\)-values \(< 1\))

  • Bonferroni is easy and it can be applied in all settings

    • Extremely conservative when the statistics from various tests are positively correlated
  • Many other varieties of adjusting after performing multiple comparisons

    • Tukey, Scheffe, etc.

    • None are great

    • Did they really adjust for all of the comparisons they made? Probably not.

    • My strong preference is to avoid multiple comparisons in the first place

      • If there was some model fitting involved to get to the final model, acknowledge that fact in the paper

      • Understand the science

      • Avoid data-driven approaches when you care about correct statistical inference (CIs and p-values)