HOMEWORK 5 Ed257 D Rogosa
Due May 4, 2005
1. A criminologist wants to estimate the proportion of U.S.
citizens who live in a home in which firearms are available. The
1991 General Social Survey asked respondents, “Do you have in
your home any guns or revolvers?” Of the respondents, 393
answered “yes” and 583 answered “no.” Construct a 90% confidence
interval for the true proportion of “yes.” Construct an exact CI
using SAS (or your own computation) and compare with the standard
large-sample normal approximation (which should do pretty well
for this example)
-------------------------------------------------------
2. The Donner Party, a more competitive environment than graduate school
From the Statistical Slueth:
20.1.1 Survival in the Donner Party—An Observational Study
In 1846 the Donner and Reed families left Springfield, Illinois,
for California by covered wagon. In July, the Donner Party, as it
became known, reached Fort Bridger, Wyoming. There its leaders
decided to attempt a new and untested route to the Sacramento
Valley. Having reached its full size of 87 people and 20 wagons,
the party was delayed by a difficult crossing of the Wasatch
Range and again in the crossing of the desert west of the Great
Salt Lake. The group became stranded in the eastem Sierra Nevada
mountains when the region was hit by heavy snows in late October.
By the time the last survivor was rescued on April 21, 1847, 40
of the 87 members had died from famine and exposure to extreme
cold.
File donner.dat contains the ages and sexes of the adult (over 15
years) survivors and non-survivors of the party. These data were
used by an anthropologist to study the theory that females are
better able to withstand harsh conditions than are males (Data
from D. K. Grayson, 1990, "Donner Party Deaths: A Demographic
Assessment," Journal of Anthropological Research 46: (1990):
223—42.)
a. Use logistic regression to predict survival using age and gender
as predictors. Comment on the results. Display probability and odds
of survival as a function of age and gender. Construct an index plot
of the deviance residuals following NWK fig 14.7 (ver4)
How do the fits for survival probabilities compare to those from
separate fits for males and females.
b. For any given age, were the odds of survival greater for women than for men?
Give a point estimate and a 95% confidence interval.
c. From the logistic fit compare the odds of survival of a woman 50 yrs
old with that for a woman 20 years old. Give a point estimate and a
95% confidence interval for the odds ratio.
d. The full model in part a contains no interaction term between
age and gender (i.e. the "effect" of gender is the same at all
levels of age. Fit a more complex model including an agexgender
interaction and conduct a statistical test for that term using a
drop-in-deviance test statistics (e.g. as in the Course example
disease data from NWK Ch 14)
------------------------------------------------------------------
3. Poisson Regression
For the Miller Lumber Poisson Regression example
in NWK Section 14.11 the data are in miller.dat:
Y in C1; X1-X5 in C2-C6.
Fit a Poisson regression model using 2 predictors:
Competitor distance, store distance.
Compare this model with a model using all 5 predictors.
----------------------------------------------------------
4. Logistic Regression: Best Subsets Variable Selection
For the disease data example (i.e. data NWK Table 14.3) use
the following SAS code [adjust path to data] to carry out a
best subsets variable selection (via the /selection=score
command).
data diseasedat;
infile 'E:\disease.dat';
input age ses1 ses2 sector disease;
run;
proc logistic data=diseasedat descending;
model disease = age ses1 ses2 sector
/selection=score;
run;
Use a drop-in-deviance test statistic to compare the best
(largest score Chi-square) 2-predictor model with the best
3-predictor model.
-------------------------------------------------
5. Deviance and computer programmers.
a. Use NWK (14.64ver4, 14.83 ver5) definition for a deviance residual to
compute the 25 deviance residuals for the programmer data. Recreate the
index plot for deviance residuals in NWK Fig 14.7ver4, or logit class handout.
b. Use equation (14.65ver4, 14.82ver5) for total model deviance to verify
that model deviance equals -2*LogLikelihood.
----------------------------------
end of HW5