Education 351,
Workshop in Technical Quality of Educational Assessments
Ed Haertel,  David Rogosa

Spring 2003,   e313,  Tuesday 2:15-5 PM

Schedule of Topics
  1. 4/1     Introduction, Overview
  2. 4/8     Accuracy of Individual Scores
  3. 4/15   Standards Setting
  4. 4/29   Current Events in Assessment and Accountability
  5. 5/6     Properties of Group Summaries
  6. 5/13   Properties of Accountability Systems.    Special Time 3-5:30 PM (due to Faculty Mtg overrun, by edict of D Stipek)
  7. Wed 5/21    Joint Meeting with Ed353C,  Applications of Generalizability Theory, usual room e313, 4:15-5:45 PM
  8. 5/27   Alternative Models for Accountability Systems
  9. 6/3     Equating
Readings and Resources

In the News
Why kids give state tests short shrift Students know scores -- crucial to schools -- won't affect grades   Meredith May, May 7, 2003

Introductory Polemic
"Why Testing Experts Hate Testing,"
by Richard P. Phelps, Fordham Report, Vol. 3, No. 1, January 1999. Phelps engages in a point-by-point analysis and rebuttal of eight arguments that testing experts commonly fling against standardized testing.

Group Summary Examples NCLB etc
Web page with links to all approved state plans under NCLB:
Dick Elmore's article from Harvard Alumni Magazine re "Testing Trap"

Berliner NEA Reports and refutations
Amrein & Berliner papers:
December 2002, "The Impact of High-Stakes Tests on Student Academic Performance"
December 2002 , "An Analysis of Some Unintended and Negative Consequences of High-Stakes Testing"
The one referenced by Ed's slides was "High-stakes testing, uncertainty, and student learning"
The Raymond & Hanushek critique
Daniel Weintraub: Research damning tests draws a flawed conclusion

Readings for 4/8, Accuracy of Individual Scores
Shoe Shopping and the Reliability Coefficient
New York Times September 13, 2000 How Tests Can Drop The Ball By RICHARD ROTHSTEIN
Accuracy Guide and Associated Technical Reports, David Rogosa

Materials for 4/15, Standards and Standard Setting     See especially Section 3 by Mark Reckase and Section 4 by Robert Forsyth     See especially Sections 6 (by ACT staff) and 7 (by W. James Popham)     The actual performance standards for California Standards Tests in Math, English Language Arts, History/Social Science, and Science.
In-press paper by Haertel and Lorie    "Validating Standards-Based Test Score Interpretations."
Controversy over raising passing score on state bar exam
Controversy over passing score on Regents physics exam

Materials for 4/29, Current Events in Assessment and Accountability
CCSSO Resources on No Child Left Behind

Materials for 5/6, Statistical Properties of Group Summaries; Introduction to Accountability Systems
Data Analyses for Accountability Systems
Interpretive Notes Series available from API Research page

    * Analyses of AB1114 Schools*
          David Rogosa, January 2002
    * Year 2001 Growth Update: Interpretive Notes for the Academic Performance Index*
          David Rogosa, December 2001
    * Year 2000 Update: Interpretive Notes for the Academic Performance Index*
          David Rogosa, October 2001
    * Interpretive Notes for the Academic Performance Index (API)*
          David Rogosa, November 2000 
    Student Progress in Charter Schools
          David Rogosa, CSE Technical Report 521, May 2002
Statistical Properties
Accuracy Reports available from API Research page
    * Plan and Preview for API Accuracy Reports*
          David Rogosa, July 2002
    * Accuracy of API Index and School Base Report Elements*
          David Rogosa, December 2002
    * Year 2000 Update: Accuracy of API Index and School Base Report Elements*
          David Rogosa, December 2002
    * Year 2001 Update: Accuracy of API Index and School Base Report Elements*
          David Rogosa, December 2002

Materials for 5/13, Answering Attacks on Accountability Systems
available from API Research page
Irrelevance of Reliability Coefficients to Accountability Systems: 
Statistical Disconnect in Kane-Staiger "Volatility in School Test Scores"*
    David Rogosa, October 2002

Commentaries on the Orange County Register Series
          o What's the Magnitude of False Positives in GPA Award Programs?*
                David Rogosa, September 2002
          o Application of OCR "margin of error" to API Award Programs*
                David Rogosa, September 2002

Materials for 5/21, Joint 351/353.  What is Generalizability Theory Good For?

Example 1. Behavioral Observations
Main references:
Rogosa, D. R., Floden, R. E., & Willett, J. B. (1984). Assessing the stability of teacher behavior. Journal of Educational Psychology, 76, 1000-1027.
Rogosa, D. R., and Ghandour, G. A. (1991). Statistical models for behavioral observations (with discussion). Journal of Educational Statistics, 16, 157-252.
Rogosa, D. R., and Ghandour, G. A. (1991). Reply to discussants: Statistical models for behavioral observations. Journal of Educational Statistics, 16, 281-294.

Example 2. Educational Assessments: performance and otherwise
Handouts from presentation: Accuracy of Individual Scores and Group Summaries CCSSO SCASS Technical Guidelines for Performance Assessment May 20,1998 Durham, NC
        G variance Components
Familiar G-favorites: ptr designs
SIZE does matter (some) but ORDER really matters ....
pXtXr data: continuous formulation: Task-wobble, Rater Smear
   Examples from Artificial Continuous Formulation: Task Wobble, Rater Smear
   Quest to reproduce CLBH variance components
Examples from Artificial Discrete Formulation: Task and Rater Misclassifications
        Individual Score Accuracy
CLBH Gullwing for Misclassification    gull-in-motion
June 1996 Presentation Handout at CCSSO (Council of Chief State School Officers) Conference on Large-Scale Assessment; Phoenix AZ.
Examples of the Performance of G-theory Extensions for Estimating Error   David Rogosa and Haggai Kupermintz, June 24, 1996 (Original booklet format is given here in page-by-page pdf format)

Materials for 5/27, Alternative Models for Accountability Systems
contrast several models, including:
* successive-cohort [like the API]
* value-added
* fixed base-year [similar to API, but with "ramp" from base year to
  performance target rather than recalculation of base every year]
* fixed-target [like AMOs {Annual Measurable Objectives} for AYP under
* hybrid systems [incorporating features of several alternatives, e.g.
  via "safe harbor" provision of NCLBA]
Teacher Effects a a Measure of Teacher Effectiveness:  Construct Validity Considerations in TVAAS (Tennessee Value Added Assessment System)    Haggai Kupermintz, Lorrie Shepard, Robert Linn University of Colorado at Boulder
A Better Student Data System for California paper by the "Public Forum on School Accountability"