Educational Accountability:
Fair and Balanced

Resource Index for presentation
September 7, Seminar on Testing,
Hechinger Institute and Center on Education Policy

David Rogosa
Stanford University


NEW      Handouts from September 7 presentation

Most "experts" in the educational research community that you as journalists would reasonably rely upon for expertise in assessment and accountability issues cannot supply such.  Arising from this dearth of knowledge on statistical issues key to accountability systems (or even large scale assessments) is the opportunity for many leading figures in educational research to substitute their own ideological (anti-testing) biases for the facts or to bash testing programs for self-promotional purposes.  All educational researchers best left behind?

Accountability is not a bad thing, but it can be done badly. And that's where statisticians (should) come in, to insure that the policy directives are implemented in a defensible form.


Policy Research and Journalism Vignettes

  •   The Volatility Scam
    Claims of  "volatility" in the school-level scores from testing programs by Linn and Haug (2002) and by Kane and Staiger (2002) represent a serious threat to defensible policy uses of test scores in school accountability systems. However, such claims are based on blunders at the level of high school statistics instruction. See Confusions about Consistency in Improvement  (especially intro examples in section 1.2); additional, perhaps less accessible, material specific to Kane-Staiger in Irrelevance of Reliability Coefficients to Accountability Systems
  •   "Margin of Error" Nonsense and the Orange County Register Debacle
    The margin of error is a misunderstanding of elementary statistical concepts that leads to hilarious assertions. Sadly, last August the Orange County Register based their series of attacks on the California API on this nonsense: chief experts/charlatans Richard Hill and Thomas Kane.
           NEW 12/03. Book Chapter treatment of the Orange County Register folly
    Older versions: see the "Commentaries on the Orange County Register Series" at the API Research Page ; in particular the "High School Intern and the API Dollars" in What's the Magnitude of False Positives in GPA Award Programs? and the "Blood Pressure Parable" in Application of OCR "margin of error" to API Award Programs
  •   Sanctions are Not the Flip-side of Awards
    Award programs, such as California API GPA have false positives and false negatives, and these are not symmetric. Basing sanctions on a failure to reach award criteria is undesirable. In other words, where should the "the benefit of the doubt" be applied?  Properties of award programs are discussed in various documents on the API Research Page
  •   Accuracy of Individual Scores
    Properties of individual student scores, such student percentile rank scores from standardized tests that go to parents and schools and which are also sometimes used for high stakes decisions, are typically described by test reliability coefficients. Unfortunately, reliability coefficients are one of the dumbest ideas ever and provide little useful information. Various documents and analyses for the accuracy of individual scores (including analyses of the CAT/6 and Stanford 9) are provided on the Accuracy Guide page
    Simplest place to start is the Shoe-Shopping Example.
  •   Demographics are far from Deterministic
    The California Teachers Association (and other critics of testing programs) seek to undermine the credibility of assessment programs with slogans such as "It's All Zip Codes" and renaming the API as the "affluent parent index". Many policy researchers (e.g. California Budget Project) feed this misrepresentation with unthoughtful correlational and multiple regression analyses. Reasonable data analysis shows that schools (and students) with similar demographic composition have very different educational performance. See the analyses in the Interpretive Notes series on the API Research Page.
         NEW 10/03   four-peat data analysis for California
  •   NCLB, Where Accountability Came to Die?
    A teaser for forthcoming work, should the question mark should be removed?
    Wise man statement:
    "It is a bad system to punish people when you set standards they can't possibly make," said Roy Romer, superintendent of the Los Angeles Unified School District, the largest school system in the state. (Los Angeles Times, Aug 16)
    1.  California's AMOs Are More Formidable Than They Appear   October 2003
    2.  The NCLB "99% confidence" scam: Utah-style calculations     November 2003
    3.  Why NCLB is a Statistical Sham. Part I: How the Confidence Interval (margin of error)
    Procedures Destroy the Credibility of State NCLB Plans 
         Draft November 2003
    4. Assessing the effects of multiple subgroups: Rebuttal to PACE Policy Brief December 2003 "Penalizing Diverse Schools? Similar test scores, but different students, bring federal sanctions"   December 2003
Discussion Item.   Rebutting bad research, Process and policy?
Is null set the best and only answer?  What do and what should journalists do after reporting in good faith on demonstrably incompetent research? Contrast education with reporting on medical research (e.g. New York Times Tuesday Health section).
   Education examples:     charter schools,     teacher credentialling     
          Public Forum on School Accountability  "A Better Student Data System for California"

Acknowledgements    Support for the research reported here has been provided by

  • the California Department of Education, Policy and Evaluation Division.
  • the Educational Research and Development Centers Program, PR/Award Number R305B60002 as administered by the Institute of Education Sciences, U.S. Department of Education. The findings and opinions expressed do not reflect the positions or policies of the National Institute on Student Achievement, Curriculum, and Assessment, the Institute of Education Sciences, or the U.S. Department of Education.