?p=33341

The NHEFS survey was designed to investigate the relationships between clinical, nutritional, and behavioural factors assessed in the first National Health and Nutrition
Examination Survey NHANES I and subsequent morbidity, mortality, and hospital
utilization, as well as changes in risk factors, functional limitation, and institutionalization. For more information see http://www.cdc.gov/nchs/nhanes/nhefs/nhefs.
htm. This question will involve using this data to estimate the average causal effect
of smoking cessation on weight gain.
(a) Individuals were classified as treated if they reported, being smokers at baseline
in 1971-75, and having quit smoking in the 1982 survey. The latter implies that
the individuals included in our study did not die and were not otherwise lost to
follow-up between baseline and 1982 (otherwise they would not have been able
to respond to the survey). That is, we selected individuals into our study conditional on an event (responding to the 1982 survey) that occurred after the start of smoking cessation. If smoking cessation affects the probability of selection
into the study, we might have selection bias (Hernan, Robins, 2014 Chapter 12,
page 11).
Would a randomized experiment of smoking cessation have this problem? How
could a randomized experiment of smoking cessation be designed? What is
the major difference between the latter randomized experiment and this study
(NHEFS survey)?
(b) Should a statistician be concerned that using the NHEFS data to compare weight
loss in the group of subjects that quit smoking versus those that did not quit
smoking is biased? If yes then state why you think the comparison might be
biased, otherwise state why the comparison is unbiased.
(c) Use R to estimate the propensity score for each subject in the study. Use
the variables: sex, race, age, education.code, smokeintensity, smokers, exercise,
active, wt71 as covariates. After calculating the propensity score use the Match
function in R to match subjects on the propensity score. Does the balance
between the two groups improve after matching? Hand in your R code and
output.
(d) Estimate the effect of smoking cessation on weight loss using propensity score
matching? Did the propensity reduce the bias in estimating the treatment effect?
What assumption can make to conclude that smoking cessation causes weight
loss? Do you think this assumption is valid? Briefly explain. Hand in your R
code and output

prop.model<-glm(qsmk~sex+race+age+education.code+smokeintensity+smokeyrs+exercise+active+wt71, family = binomial(), data = nhefshwdat)

对我们要对总体样本执行广义回归模型（logit回归），以是否戒烟为因变量，性别种族年龄教育程度等8个变量作为协变量，然后估计出每一个观测对象戒烟的概率是多少。

可以得到是否戒烟这个二元逻辑变量与其他协变量的线性回归关系。

nhefshwdat$p.qsmk.obs <- ifelse(qsmk == 0, 1 - predict(prop.model, type = "response"),

+                                  predict(prop.model, type = "response"))#用上一步得到的模型预测每一个观测对象的戒烟概率为多少，并赋值给p.qsmk.obs这个变量。

X <- prop.model$fitted#对nhefshwdat数据集中原始数据进行拟合

Y <- nhefshwdat$wt82_71#Y为观测对象从71年到82年的体重变化

Tr <-nhefshwdat$qsmk#Tr为观测对象是否戒烟

library(Matching)#读取Matching包

rr <-Match(Y=Y,Tr=Tr,X=X,M=1)#使用Match命令，对于每个戒烟的观测对象，找出一个与之具有最接近的概率值的，可是抽烟的观测对象，使得任何戒烟的观察对象的对照对象都具有唯一性，换言之，只能1对1匹配。观测他们的体重变化差异。

summary(rr)#