Assignment #6 Ed257 Spring 2005 D Rogosa
Due May 24, 2005
-------------------
NOTE:
Problems 8,9,10 are more like didactic extensions of
Lecture than typical HW problems. You may find it equally
useful to work through these with the solutions (at
your convenience).
All references are to "little" Agresti. There are corresponding
sections in the larger Agresti text (with different examples).
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1. A study on educational aspirations of high school students (S.
Crysdale, mt. J. Compar. Sociol., 16: 19—36 (1975)) measured
aspirations using the scale (some high school, high school
graduate, some college, college grad- uate). For students whose
family income was low, the counts in these categories were
(9,44,13, 10); when family income was middle, the counts were
(11,52, 23, 22); when family income was high, the counts were
(9,41,12,27).
Part I (basics)
a. Test independence of educational aspirations and family income
using X2 or G2. Interpret.
b. Calculate the adjusted residuals. Do they suggest any association
pattern? construct a table following Agresti Table 7.3 (i.e. for
the 3x4 table three entries {count, fit under independence,
and adjusted residual}. Do the adjusted residuals suggest any
association pattern?
Part II
Following the class example "Linear Association Models for Ordinal Data"
construct a linear association term and include this term in the
log-linear model. Is this model an improvement over the independence
model (and saturated model).
construct a table following Agresti Table 7.4 (i.e. for
the 3x4 table three entries {count, fit under linear association,
and adjusted residual}. Demonstrate that 2x2 tables of fits formed by
adjacent cells (e.g. Agresti Fig 7.1) have identical odds ratios. That is
linear association model behaves like straight-line regression for
measured variables--the increase in the fit from increasing the predictor
by one unit is the same at all values of the predictor.
----------------------------------
2. Most colleges and Universities have annual campaigns in which
they ask former graduates to contribute money. For the 1986 to
1987 Providence College fund-raising campaign, statistics were
recorded for the number of people contacted and the number of
doners categorized by their class year. Some of these data are
summarized in the rable below. (data from Providence College
Fund Year Report 1986-7).
Class
1961 1966 1971 1976 1981
Contributed 196 266 194 276 333
Did not Contribute 123 226 241 322 568
---------------------------------
Construct a null hypothesis that the probabilities of
contributing are the same for all these 5 classes.
Calculate a table of expected counts under the assumption
that this null hypothesis is true.
Construct a test statistic for this null hypothesis and
carry out a test of the null hypothesis using Type I error rate
.01.
Carry out a test for linear trend in the proportion contributing
following the course example:
"Trend in 2xC tables, Alcohol and Infant Malformation"
Compare this with the results for the test of independence above
----------------------------------------------------------------------
3. For the asprin and myocardial infarction 2x2 table from aspirin
handout and Agresti section 2.2.2 compute by plug-in (rather than SAS)
a point and interval estimate (95%) for the odds ratio for an MI.
---------------------------------------------------------------------
4. Chicago Crime data.
The cell counts that follow are said to be for the number of crimes
(perhaps daily or hourly ?) committed in Chicago area.
The variables are (1) type of neighborhood (suburb vs. center of city),
(2) socioeconomic status of neighborhood (high SES vs. low SES), and
(3) year the crimes were committed (1976 vs. 1986).
Here's the data:
Suburbs Center of City
High SES 5 10
1976
Low SES 15 120
High SES 5 10
1986
Low SES 15 90
call city-suburb 'C', SES 'S' , year 'T' .
a. from these sample data, what is the marginal CxS table?
what's the odds ratio for this table?
what are the partial C-S odds ratios?
b. are there any marginal tables that exhibit Simpson's paradox?
c. Suppose that from fitting a log-linear model to these data the likelihood
ratio fit statistics told you that the best model was (CS, CT, ST).
which sets of variables appear to be conditionally independent?
mutually independent?
----------------------------------------------------
5. In an article about crime in the United States, Newsweek
magazine (Jan. 10, 1994) quoted FBI statistics stating that of
all blacks slain in 1992, 94% were slain by blacks, and of all
whites slain in 1992, 83% were slain by whites. Let Y denote race
of victim and X denote race of murderer.
a. Which conditional distribution do these statistics refer to,
Y given X, or X given Y?
b. Calculate and interpret the odds ratio between X and Y.
c. Given that a murderer was white, can you estimate the probability
that the victim was white? What additional information would you
need to do this?
(Hint: Use Bayes Theorem.)
---------------------------------------------------------
6. A criminologist wants to estimate the proportion of U.S.
citizens who live in a home in which firearms are available. The
1991 General Social Survey asked respondents, “Do you have in
your home any guns or revolvers?” Of the respondents, 393
answered “yes” and 583 answered “no.” Construct a 90% confidence
interval for the true proportion of “yes.” Construct an exact CI
using SAS (or your own computation) and compare with the standard
large-sample normal approximation (which should do pretty well
for this example)
-------------------------------------------------------
7. Death Penalty example from lecture
Death Penalty Example
Radelet (1981), studied effects of racial characteristics on
whether individuals convicted of homicide receive the death
penalty. The variables are “death penalty verdict,” having
categories (yes, no), and race of defendant” and “race of
victim,” each having categories (white, black) The 326 subjects
were defendants in homicide indictments in 20 Florida counties
during 1976—1977, and the data form a 2 x 2 x 2 contingency
table.
Data are available in the course example (deathpen)
Agresti sec 3.1 has a larger similar data set
a. Run the set of log-linear models for the death penalty using
SAS Proc Genmod, and identify the best fitting model.
b. For the best-fitting model obtain the fitted odds-ratios
for the conditional and marginal 2x2 tables (e.g. the form
of entries in Agresti Table 6.5). Interpret.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
NOTE Problems 8,9,10 are more didactic extensions of
Lecture than typical HW problems. You may find it equally
useful to work through these with the solutions (at
your convenience).
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
8. Migraine by 0,1 data anova etc
The Migraine Course Example discussed in lecture
has the form of an experimental 2x2 factorial design (factors
gender, treatment) with a binary outcome (same, better).
If the outcome were continuous/measured (e.g. amount of relief)
we would think we know how to analyze these data from anova-based
methods of Winter qtr (or even ed161).
Here are two options for anova-style analyses.
a. consider a 2x2 design with the outcome for each cell
being the proportion of subjects getting "better".
c.f NWK p.773 ("response is a proportion") for details
on arcsine transformation etc)
b. Reconstruct the individual 0,1 data file for this 2x2 design
which has 106 rows.
The experiment has cell sizes
Active Placebo
F 27 25
M 28 26
which is only slightly unbalanced. Use glm in minitab
to obtain an anova table and interpret.
---------------------------------------------------------------
9. Revisit Alcohol, Cigarette, and Marijuana Use Example
a. From the output from the best-fitting model (AC,AM,CM)
verify that the coefficients for the interaction terms
are the natural logs of the odds-ratios for the
fitted values (e.g. shown in Agresti p.153 table 6.5)
b. A C M 0,1 indiv data, phi coeff and correlations
The question raised in lecture (and not fuly answered):
what can we 'learn' from this study?
The Alcohol, Cigarette, and Marijuana Use is
simply an observational (correlational) study with
3 intercorrelated variables, all of which happen to be
binary.
One way to approach the study is to obtain obtain the 3x3
correlation matrix for the three variables over the 2276
respondents. These correlation coeffs are phi coefficients
for the corresponding 2x2 tables. Compute the 3 pairwise
correlation coefficients (by either reconstructing individual
0,1, data) or computing phi-coefficients from the counts
(Formulas for the phi-coefficient in terms of table counts
or chi-square were given in lecture also in basic texts such as
Hopkins&Glass ).
--Compare these correlation coefficients with the
corresponding marginal associations (odds-ratios)
obtained from the best fitting log-linear model.
--Compare the partial correlation coefficients,
AC.M AM.C CM.A, with the conditional association
(odds-ratios) obtained from the best fitting
log-linear model.
-------------------------------------------------------------------
10. odds ratio and correlations
One way to try to interpret odds ratios and log-odds
is by linking them with the more familiar idea of correlation
coefficients. For binary data the phi-coefficient is
the usual Pearson correlation coefficient.
Try the following exercise:
start with a 2x2 table with 100 counts in each cell
(i.e. no association). Then construct a series
of 2x2 tables by altering the diagonals, adding
or subtracting 10, 20 30 40 50 counts and adjusting the
off-diagonals to keep the marginal counts constant.
Example
Base 2x2 table altered table (adding 20 to diag)
100 100 120 80
100 100 80 120
Use a set of these tables to see how the odds-ratio corresponds
to a value of the correlation coefficient
------------------------------
-------------------------------
end of HW6