Confouding, Effect Modification, and Mediation

Lecture 06

Chris Slaughter

October 30, 2025

Code
library(ggplot2)
tryCatch(source('pander_registry.R'), error = function(e) invisible(e))

1 Overview

  • Most often scientific questions are translated into comparing the distribution of some response variable across groups of interest

  • Groups are defined by the predictor of interest (POI). Regression methods allow for either categorical or continuous predictors of interest

    • Categorical predictors of interest: Treatment or control, knockout or

      wild type, ethnic group

    • Continuous predictors of interest: Age, BMI, cholesterol, blood pressure

  • Often we need to consider additional variables other than the predictor of interest to answer our scientific question

  • Covariates other than the predictor of of interest are included in the regression model as

    • Effect modifiers

    • Confounders

    • Mediating variables

    • Precision variables

    • (Not necessarily mutually exclusive)

2 Effect Modification

  • The association between the Response and the Predictor of Interest differs in strata defined by the effect modifier

  • Statistical term: “Interaction” between the effect modifier and the POI

2.1 Effect modification depends on the measure of effect that you choose

  • Choice of summary measure: mean, median, geometric mean, odds, hazard

  • Choice of comparisons across groups: differences, ratios

2.2 Examples of Effect Modification

2.2.1 Example 1: Is serum LDL by smoking modified by gender?

Mean Mean Median Median
Women Men Women Men
No Smoke 120 122 120 115
Smoke 133 122 133 124
Diff -13 0 -13 -9
Ratio 0.90 1 0.90 0.93
  • Effect modification by gender exists if the effect of smoking in males is different from the effect of smoking in females

    • Compare the diff (or ratio) for the mean (or median) in women to the diff (or ratio) for the mean (or median) in men
  • Effect modification for mean, not as much for median

    • Statement holds for both difference or ratio of mean/median

2.2.2 Example 2: Creatinine by stroke (modified by gender?)

Mean Mean Median Median
Women Men Women Men
No Stroke 0.72 1.08 0.7 1.1
Stroke 1.01 1.51 1.0 1.5
Diff -0.29 -0.43 -0.3 -0.4
Ratio 0.71 0.72 0.70 0.73
  • Effect modification for difference, not really for ratio

    • True for Mean or median

2.2.3 Example 3: Stroke by smoking (modified by gender?)

Proportion Proportion Odds Odds
Women Men Women Men
No Smoke 0.10 0.16 0.03 0.19
Smoke 0.16 0.26 0.19 0.35
Diff -0.06 -0.10 -0.16 -0.16
Ratio 0.62 0.62 0.16 0.54
  • Proportion: No effect modification for ratio, small amount for difference

  • Odds: No effect modification for difference, yes for ratio

2.2.4 Example 4: Stroke by smoking (modified by CVD?)

Proportion Proportion Odds Odds
None CVD None CVD
No Smoke 0.02 0.33 0.02 0.50
Smoke 0.04 0.50 0.04 1.00
Diff -0.02 -0.17 -0.02 -0.50
Ratio 0.50 0.67 0.50 0.50
  • Effect Modficiation?

    • Proportion: Yes for ratio, yes for difference

    • Odds: Yes for difference, no for ratio

2.2.5 Example 5: CHD by current smoking (modified by gender?)

Proportion Proportion Odds Odds
Women Men Women Men
No Smoke 0.18 0.26 0.22 0.36
Smoke 0.05 0.24 0.05 0.32
Diff 0.13 0.02 0.17 0.03
Ratio 3.60 1.08 4.17 1.11
  • Effect Modfication?

    • Proportion: Yes for ratio, yes for difference

    • Odds: Yes for difference, yes for ratio

2.2.6 Example 6: CHD by ever smoke (modified by gender?)

Proportion Proportion Odds Odds
Women Men Women Men
Never Smoke 0.16 0.25 0.19 0.33
Ever Smoke 0.16 0.26 0.19 0.35
Diff 0.00 -0.01 0.00 -0.02
Ratio 1.00 0.96 1.00 0.95
  • Effect Modfication?

    • Proportion: No for ratio, no for difference

    • Odds: No for difference, no for ratio

2.2.7 Summary comments on examples

  • If there is an effect, will see effect modification on at least one of the difference and ratio scale

  • If there is no effect (example 6), will see no effect modification on both difference and ratio scale

  • In real world, will usually see effect modification on both scales. The real question is “Is the effect modification scientifically meaningful?”

    • If we find there is important effect modification, science will go forward estimating effects separately

    • Models with interaction terms are useful for testing if effect modification is present (statistically)

  • Aside: Be careful when comparing two ratios

    • How close are two ratios?

      • 0.20 and 0.25 VERSUS 5.0 and 4.0?

      • 0.10 and 0.15 VERSUS 10.0 and 6.7?

    • Compare the ratio of ratios, not the difference

    • We might consider ratios to be more different when both ratios are \(>1\) than when both are \(<1\). But, that would be wrong.

2.3 Analysis of Effect Modification

  • When the scientific question involves effect modification

    • Conduct analysis within each stratum separately

    • If we want to estimate the degree of effect modification or test its existence, use a regression model including

      • Predictor of interest (main effect)

      • Effect modifying variable (main effect)

      • A covariate modeling the interaction (usually a product)

    2.3.1 Impact of ignoring effect modification

    • By design or mistake, we sometimes do not model effect modification

    • Might perform

      • Unadjusted analysis: POI only

      • Adjusted analysis: POI and third variable, but no interaction term

    • If effect modification exists, an unadjusted analysis will give different results according to the association between the POI and effect modifier in the sample

  • If the POI and the effect modifier are not associated

    • Unadjusted analysis tends toward an (approximate) weighted average of the stratum specific effects

      • With means, exactly a weighted average

      • With odds and hazards, an approximate weighted average (because they are non-linear functions of the mean)

  • If the POI and the effect modifier are associated in the sample

    • The “average” effect is confounded and thus unreliable (variables can be both effect modifiers and confounders)
  • If effect modification exists, an analysis adjusting only for the third variable (but no interaction) will tend toward a weighted average of the stratum specific effects

    • Hence, an association in one stratum and not the other will make an adjusted analysis look like an association (provide the sample size is large enough)

3 Confounding

3.1 Simpson’s Paradox

  • Confounding has its roots in Simpson’s Paradox

  • Given binary variables \(Y\) (response), \(X\) (POI), and \(Z\) (strata) it is possible to have …

\[\textrm{Pr}(Y=1 | X=1, Z=1) > \textrm{Pr}(Y=1 | X=0, Z=1)\] \[\textrm{Pr}(Y=1 | X=1, Z=0) > \textrm{Pr}(Y=1 | X=0, Z=0)\]

… but to also have …

\[\textrm{Pr}(Y=1 | X=1) < \textrm{Pr}(Y=1 | X=0)\]

3.1.1 Example: Probability of death (Y) at two hospitals (X) stratified by poor patient condition (Z)

  • Question: Which hospital do you want to be treated at?
    • Consider the results overall (averaging over Z) and conditional on Z below
Overall Died Survived Death Rate
Hospital A 16 784 2.0%
Hospital B 63 2037 3.0%
Good Condition Died Survived Death Rate
Hospital A 8 592 1.3%
Hospital B 6 594 1.0%
Poor Condition Died Survived Death Rate
Hospital A 8 192 4.0%
Hospital B 57 1443 3.8%
  • Ignoring condition, Hospital B has a higher death rather. However, within both poor and good condition, Hospital B has a lower death rate.

    • Poor condition is a confounder. Hospital B has more subjects with poor condition and subjects with poor condition have a higher death rate.

3.2 Definition of Confounding

  • The association between a predictor of interest and the response is confounded by a third variable if

    • The third variable is associated with the predictor of interest in the sample, AND

    • The third variable is associated with the response

      • Causally (in truth)

      • In groups that are homogeneous with respect to the predictor of interest

      • Not in the causal pathway of interest

  • We must consider our belief about the causal relationships among the measured variables

    • There is no statistical test for causality

    • Inference about causation comes only from the study design

    • BUT, consideration of the causal relationships helps us to decide which statistical questions to answer

  • Classic confounder

    • A clear case of confounding occurs when some third variable is a “cause” of both the POI and response
    • We generally adjust for such a confounder

3.2.1 Directed Acyclyic Graph

flowchart LR
  X[Predictor\nof interest] -- Causal? --> Y[Outcome]
  X[Predictor\nof interest] <-- Association --> Z[Confounder]
  Z[Confounder] -- Causal --> Y[Outcome]

  • Example: Ice cream (POI), murder rate (outcome), and temperature (confounder) in New York City during the summer

flowchart LR
  X[Ice Cream] -- Causal? --> Y[Murder Rate]
  X[Ice Cream] <-- Association --> Z[Air temperature]
  Z[Air temperature] -- Causal --> Y[Murder Rate]

3.2.2 Causal pathways

A variable in the causal pathway of interest
  • Not a confounder, so we would not adjust for such a variable

  • If we did adjust, we would lose ability to detect associations between the POI and the outcome

  • Example: Second hand smoke (POI), stunted growth (confounder), FEV1 (outcome)

    • Scientific question is about the impact of smoking on lung function

      • Stunted growth addresses lung anatomy, not lung function, which we don’t care about it

flowchart LR
  X[Second hand smoke] --> Y[FEV1]
  X[Second hand smoke] --> Z[Stuntend growth]
  Z[Stunted growth] --> Y[FEV1]

A variable in the causal pathway not of interest
  • However, we want to adjust for a variable in a causal pathway that is not of interest
    • Example: Work stress causing ulcers by hormonal effects versus alcoholism
    • Directed Acyclyic Graph
      • We can adjust for alcoholism to estimate the path through horomonal effects
      • Alternatively, we can adjust for hormonal effects to estimate the effect through alcoholism

flowchart LR
  X[Work stress] --> W[Hormonal Effects]
  X[Work Stress] --> Z[Alcholism]
  W[Hormonal Effects] --> Y[Ulcers]
  Z[Alcholism] --> Y[Ulcers]

Surrogate for response
  • Adjustment for a surrogate is a bad idea

  • As the name implies, surrogates are a substitute for the response variable

  • Directed Acyclyic Graph where forced vital capacity (FVC) is a surrogate for forced exp

flowchart LR
  X[Second hand smoke] --> Z[FVC]
  Z[FVC] --> Y[FEV1]

Many other (complicated) patterns possible

3.3 Diagnosing Confounding

  • Confounding typically produces a difference between unadjusted and adjusted analyses

    • This symptom is not proof of confounding

      • Such a difference can occur when there is no confounding

      • Symptom is more indicative of confounding when modeling means (linear regression) than when modeling odds (logistic regression) or hazards (Cox, proportional hazards regression)

  • Estimates of association from unadjusted analysis are markedly different from estimates of association from adjusted analysis

    • Association within each stratum is similar to each other, but different from the association in the combined data
  • In linear regression, differences between adjusted and unadjusted analyses are diagnostic of confounding

    • Precision variables tend to change standard errors, but not slope estimates

    • Effect modification would show differences between adjusted analysis and unadjusted analysis, but would also show different associations in the strata

  • More difficult to diagnosis confounding with non-linear functions of the mean

    • Common non-linear functions: Odds (odds ratios), hazards (hazard ratios)

    • May show the symptoms of confounding when confounding is not present

    • Adjusting for precision variables can appear to be confounding

    • In logistic and PH regression, difference between adjusted and unadjusted analyses are more difficult to judge

      • Comparison in more homogeneous groups (i.e. after adjustment for a precision variable) will drive slope estimates away from the null
  • Example: Suppose you have a sample where 50% of the subjects die

    • What is the variability? \(p*(1-p)=0.25\)

    • We can reduce this variability by changing \(p\), the probability of death

    • Estimate \(p\) in different stratum. One stratum may have a higher \(p\), another a lower \(p\).

      • e.g. Stratum 1, 20% death rate and Stratum 2 an 80% death rate.
    • By making the estimate more precise, we have also impacted the mean

4 Mediating variables

  • A mediating variable is a predictor hypothesized to lie on the causal pathway between a predictor of interest and the outcome

  • Whether or not to adjust for a mediating variable depends on the scientific question of interest

    • Without adjustment, we are estimating the total (causal) effect of the predictor of interest via all pathways on the outcome

    • If we adjust for a mediator, we are estimating the direct (causal) effect via other pathways that do not involve the mediator

  • If a potential mediator is identified a priori on scientific grounds, we can estimate the direct effect via pathways other than the mediator, the indirect effect through the mediator, and the degree of mediation (e.g. proportion mediated).

  • Example: Depression (predictor) and Crohn’s Disease activitity (outcome) with possible mediators being physical activity, smoking, and sleep quality

    • Mediation models were considered using baseline, 6-month, and 12-month follow-up data

    • Depression at baseline was the predictor of interest, mediators were assessed at 6 months, and disease activity at 12 months was the outcome.

    • To establish and estimate the mediation effect, we utilized the four steps originally outlined by Barron and Kenny 1

      • First, we tested that there was a significant effect of depression at baseline with disease activity at 12 months to establish there was an effect to be mediated

      • Second, we estimated the correlation between depression at baseline and each of the potential mediators measured at 6 months

      • Third, we tested if each of the mediators at 6 months was associated with disease activity at 12 months while controlling for depression at baseline.

      • Fourth, the mediation package in R was used to estimate various quantities for causal mediation analysis, including average causal mediation effects (indirect effect), average direct effects, proportions mediated, and total effect. This step involves combining estimates from the mediator model (mediator at 6 months outcome, depression at baseline the predictor) and estimates from the outcome model (disease activity at 12 months outcome, mediator at 6 months and depression at baseline as predictors of interest, and controlling for disease activity at baseline) efficiently and with appropriate standard error estimates.

    • Results

1 Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psychol. 1986 Dec;51(6):1173-82. doi: 10.1037//0022-3514.51.6.1173. PMID: 3806354.

Multiple mediation model path diagram and estimates from structural equations model. Part A shows the total effect of depression at baseline on disease activity at 12 months is that, per one unit increase in depression, disease activity increases by 2.90. Part B show the associations of depression at baseline, mediators at 6 months, and disease activity at 12 months. Individual mediation effects are found by multiplying a by b for each mediator. Of the 2.90 total effect, 0.005 went through sleep, 0.138 went through smoking and 0.107 went through activity. Summing these three value, a total of 0.250 (8.6% mediated) when through the three mediators.

5 Precision Variables

5.1 Overview

  • Sometimes the scientific question to be answered is chosen based on which questions can be answered most precisely

  • In general, questions can be answered more precisely when the within group distribution is less variable

  • Comparing groups that are similar with respect to other important risk factors decreases variability

  • The precision variability is independent of the cause of the response

  • If we adjust for such a variable, we tend to gain precision

  • Directed Acyclyic Graph:

flowchart LR
  X[Predictor] --> Y[Outcome]
  Z[Precision] --> Y[Outcome]

5.2 Adjusting for Precision Variables

5.2.1 Precision for Difference of Independent Means

  • Independent observations where group 1 has a different mean and variance than group 2

    • \(\textrm{ind } Y_{ij} \sim (\mu_j, \sigma_j^2), j = 1, 2; i = 1, \ldots, n_j\)

    • \(n = n_1 + n_2\); \(r = n_1 / n_2\)

    • \(\theta = \mu_1 - \mu_2\), \(\hat{\theta} = \overline{Y}_1 - \overline{Y}_2\)

    • \(V = (r+1)(\frac{\sigma_1^2}{r} + \sigma_2^2)\)

    • \(se(\hat{\theta}) = \sqrt{\frac{V}{n}} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)

  • Might control for some variable in order to decrease the within group variability

    • Restrict population sampled

    • Standardize ancillary treatments

    • Standardize measurement procedure

5.2.2 Precision for Linear Regression

  • Independent continuous outcome associated with covariate (\(X\))

    • \(\textrm{ind } Y_i | X_i ~ \sim(\beta_0 + \beta_1 X_i, \sigma^2_{Y|X}), i = 1, \ldots, n\)

    • \(\theta = \beta_1, \hat{\theta} = \hat{\beta_1}\) from LS regression

    • \(V = \frac{\sigma^2_{Y|X}}{\textrm{Var}(X)}\)

    • \(se(\hat{\theta}) = \sqrt{\frac{\hat{\sigma}^2_{Y|X}}{n \hat{\textrm{Var}}(X)}}\)

  • Adjusting for covariates (\(W\)) decreases the within group standard deviation

    • \(\textrm{Var}(Y | X)\) versus \(\textrm{Var}(Y | X, W)\)
  • Independent continuous outcome associated with covariate (\(X\)) and precision variable (\(W\))

    • \(\textrm{ind } Y_i | X_i, W_i ~ \sim(\beta_0 + \beta_1 X_i + \beta_2 W_i, \sigma^2_{Y|X,W}), i = 1, \ldots, n\)

    • \(\theta = \beta_1, \hat{\theta} = \hat{\beta_1}\) from LS regression

    • \(V = \frac{\sigma^2_{Y|X,W}}{\textrm{Var}(X)(1-r^2_{X,W})}\)

    • \(se(\hat{\theta}) = \sqrt{\frac{\hat{\sigma}^2_{Y|X,W}}{n \hat{\textrm{Var}}(X)(1-r^2_{X,W})}}\)

    • \(\sigma^2_{Y|X,W} = \sigma^2_{Y|X} - \beta_2^2 \textrm{Var}(W | X)\)

5.2.3 Precision for Difference of Proportions

  • When analyzing proportions (means), the mean variance relationship is critical

    • Precision is greatest when proportion is close to 0 or 1

    • Greater homogeneity of groups makes results more deterministic (this is the goal, at least)

  • Independent binary outcomes

    • \(\textrm{ind } Y_{ij} \sim B(1, p_j), i = 1, \ldots, n_j; j = 1, 2\)

    • \(n = n_1 + n_2; r = n_1 / n_2\)

    • \(\theta = p_1 - p_2\), \(\hat{\theta} = \hat{p}_1 - \hat{p_2} = \overline{Y}_1 - \overline{Y}_2\)

    • \(\sigma^2_j = p_j(1-p_j)\)

    • \(V = (r+1)(\frac{\sigma_1^2}{r} + \sigma_2^2)\)

    • \(se(\hat{\theta}) = \sqrt{\frac{V}{n}} = \sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}\)

5.2.4 Precision for Odds

  • When analyzing odds (a nonlinear function of the mean), adjusting for precision variables results in more extreme estimates

  • \(\textrm{Odds} = \frac{p}{1-p}\)

  • Odds using average of stratum specific \(p\) is not the average of stratum specific odds

  • Example: Stroke by smoking (in CVD strata)

    • No association between smoking and CVD in the sample: 10% smokers in subjects with and without CVD

    • CVD (precision variable) associated with stroke (outcome)

      • CVD is not a confounder, but it is clearly a precision variable
    • Within each CVD stratum, the (adjusted) odds ratio for stroke by smoking is 2.0.

    • Overall, the unadjusted odds ratio is 1.8. That is, the unadjusted odds ratio is attenuated toward the null compared to the adjusted odds ratios

      No CVD N p Odds
      Smoke 1000 0.04 0.04
      Non-Smoke 10000 0.02 0.02
      Ratio 2.0
      CVD N p Odds
      Smoke 100 0.50 1.00
      Non-Smoke 1000 0.33 0.50
      Ratio 2.0
      Overall N p Odds
      Smoke 1100 0.082 0.09
      Non-Smoke 11000 0.048 0.05
      Ratio 1.8

6 Diagnosing Confounding

6.1 Adjustment for Covariates

  • We include predictors in an analysis for a number of reasons. In order of importance…

    1. Scientific question

      • Predictor of Interest

      • Effect Modifiers

    2. Adjust for confounding

    3. Gain precision

  • Adjustment for covariates changes the question being answered by the statistical analysis

    • Adjustments can be made to isolate associations that are of particular interest

    • When consulting with a scientist, it is often difficult to decide whether the interest in an additional covariate is due to confounding, effect modification, or precision

      • The distinction is important because I tend to treat these variable differently in the analysis
    • Often the scientific question dictates inclusion of particular predictors

      • Predictor of interest: The scientific parameter of interest can be modeled by multiple predictors (e.g. dummy variables, polynomials, splines)

      • Effect Modifiers: The scientific question relates to the detection of effect modification

      • Confounders: The scientific question may be state in terms of adjusting for known (or suspected) confounders

6.2 Confounder Detection

  • Unanticipated confounding

    • Some times we must explore our data to assess whether our results were confounded by some variable

    • Goal is to assess the “independent effect” of the predictor of interest on the outcome

  • Confounders

    • Variables (causally) predictive of the outcome, but not in the causal pathway

      • Best method: Think about the scientific problem beforehand (perhaps draw DAG)

      • Using data, often assessed in the control group

    • Variables associated with the predictor of interest in the sample

      • Note that statistical significance is not relevant because this tells us about associations in the population
    • Detection of confounding ultimately must rely on our best knowledge about the possible scientific mechanisms

  • Effect of confounding: A confounder can make the association between the predictor of interest and the response variable look…

    • Stronger than the true association

    • Weaker than the true association

    • The complete reverse of the true association (“qualitative confounding”)

7 Graphical Methods for Visualizing Effect Modification, Confounding, and Precision

  • Conduct stratified analysis to distinguish between

    • Effect modifiers

    • Confounders

    • Precision variables

  • Plots most illustrative for continuous outcomes

7.1 Effect Modifiers

  • Estimates of treatment effect differ among strata

    • When analyzing difference of means of continuous data, stratified smooth curves of the data are non-parallel

    • Graphical techniques difficult in other settings

Code
set.seed(1231)
n <- 200
grp <- rep(c(0,1),each=n/2)
X <- runif(n)
emplot <- data.frame(grp=grp,
                     X=X,
                     Y=.1*grp + 2*X -4*grp*X + rnorm(n)
)
emplot$group <- factor(emplot$grp, levels=0:1, labels=c("Trt","Ctrl"))
ggplot(emplot, aes(x=X, y=Y)) + geom_point() + geom_smooth(se=FALSE) + theme_bw()

Unadjusted association between outcome and predictor shows a flat slope.
Code
ggplot(emplot, aes(x=X, y=Y, color=group, grp=group)) + geom_point() + geom_smooth(se=FALSE,) + theme_bw()

Adjusted association between outcome and predictor shows an increasing slope in one group and a decreasing slope in the other. This demonstrates effect modification of group.

7.2 Confounders

  • Estimates of treatment effect the same across strata, AND

    • Confounder is causally associated with the response, AND

    • Confounder associated with the POI in the sample

  • When analyzing difference of means of continuous data

    • Stratified smooth curve of data are parallel

    • Distribution of POI differs across strata

    • Unadjusted and adjusted analyses give different estimates

Code
set.seed(1231)
n <- 200
grp <- rep(c(0:3), each=n/4)
X <- c(runif(n/4, 0,.4), runif(n/4, 0.2,.6), runif(n/4, .4,.8), runif(n/4, .6, 1))
Y <- X - grp + rnorm(n)

par(mfrow=c(1,2))
plot(X,Y, main="Unadjusted")
lines(lowess(Y~X), lwd=2)

plot(X,Y, pch=(grp+1), main="Adjusted Confounding")
m1 <- lm(Y~X + grp)
segments(x0=0  , x1=.4, y0=coef(m1)[1] + 0*coef(m1)[2],   y1=coef(m1)[1] + 0.4*coef(m1)[2], lwd=2)
segments(x0=0.2, x1=.6, y0=coef(m1)[1] + 0.2*coef(m1)[2]+ 1*coef(m1)[3], y1=coef(m1)[1] + 0.6*coef(m1)[2] + 1*coef(m1)[3], lty=2, lwd=2)
segments(x0=0.4, x1=.8, y0=coef(m1)[1] + 0.4*coef(m1)[2]+ 2*coef(m1)[3], y1=coef(m1)[1] + 0.8*coef(m1)[2] + 2*coef(m1)[3], lty=3, lwd=2)
segments(x0=0.6, x1= 1, y0=coef(m1)[1] + 0.6*coef(m1)[2]+ 3*coef(m1)[3], y1=coef(m1)[1] +   1*coef(m1)[2] + 3*coef(m1)[3], lty=4, lwd=2)

7.3 Precision Variables

  • Estimates of treatment effect the same across strata, AND

    • Variable is causally associated with the response, AND

    • Variable is not associated with the POI in the sample

  • When analyzing difference of means of continuous data

    • Stratified smooth curve of data are parallel

    • Distribution of POI same across strata

    • Unadjusted and adjusted analyses give similar estimates but with smaller standard errors

Code
set.seed(1231)
n <- 200
grp <- rep(c(0:3), each=n/4)
X <- runif(n)
Y <- 3*X + 5*grp + rnorm(n)

par(mfrow=c(1,2))
plot(X,Y, main="Unadjusted Precision")
lines(lowess(Y~X), lwd=2)
plot(X,Y, main="Adjusted Precision", pch=(grp+1))
m2 <- lm(Y~X + grp)
abline(a=coef(m2)[1], b=coef(m2)[2], lwd=2)
abline(a=coef(m2)[1] + 1*coef(m2)[3], b=(coef(m2)[2]), lty=2, lwd=2)
abline(a=coef(m2)[1] + 2*coef(m2)[3], b=(coef(m2)[2]), lty=3, lwd=2)
abline(a=coef(m2)[1] + 3*coef(m2)[3], b=(coef(m2)[2]), lty=4, lwd=2)