Menu

Argument: NCLB standardized tests are a poor measure of school performance

Issue Report: No Child Left Behind Act

Support

Walter Haney. “Evidence on Education under NCLB (and How Florida Boosted NAEP Scores and Reduced the Race Gap)“. September 8, 2006] – The No Child Left Behind Act has brought increased attention to the rating of school quality in terms of student performance on state math and reading tests. However, many observers have noted the weakness of rating school quality simply in terms of such measures. Doubts arise not just because of the non-comparability of state reading and math tests and ratings based on them (Linn & Baker, 2002), but for the more fundamental reason that the goals of public education in the U.S. clearly extend beyond the teaching of reading and math skills. To address the former problem, many observers have suggested reliance on results of the state

National Assessment of Education Progress (NAEP) results as providing a common metric of student performance in grades 4 and 8 in reading and math (and occasionally other subjects) across the states. The broader question of how school quality might be judged has been raised in the 2006 convention of the National Education Association (NEA). The NEA endorsed a system of accountability “based on multiple benchmarks, including teacher-designed classroom assessments, student portfolios, graduation statistics, and college enrollment rates, among other measures” (Honawar, 2006, p. 8). The problem of reaching summary judgments on school quality is also addressed at least implicitly in the exercise I distributed here, “Rating School Quality Exercise.” This is a sort of exercise I have used for 20 years and the results illustrate the perils and indeed the mathematical impossibility of reaching sound summary judgments on matters of educational quality and educational inequality. Before addressing these matters, I discuss the illusion of progress in Florida’s 2005 grade 4 NAEP results, and the value of examining rates of student progress through the K-12 grade span as evidence of school system quality. In conclusion, I suggest how the upcoming reauthorization of the NCLB Act might be shaped.

Charles Murray. “Acid Tests”. Wall Street Journal. July 25, 2006 – The Department of Education will undoubtedly produce numbers to dispute the findings of the Civil Rights Project, which brings me to the point of this essay. Those numbers will consist largely of pass percentages, not mean scores. A particular score is deemed to separate “proficient” from “not proficient.” Reach that score, and you’ve passed the test. If 60% of one group–blondes, let’s say–pass while only 50% of redheads pass, then the blonde-redhead gap is 10 percentage points.

A pass percentage is a bad standard for educational progress. Conceptually, “proficiency” has no objective meaning that lends itself to a cutoff. Administratively, the NCLB penalties for failure to make adequate progress give the states powerful incentives to make progress as easy to show as possible. A pass percentage throws away valuable information, telling you whether someone got over a bar, but not how high the bar was set or by how much the bar was cleared. Most importantly: If you are trying to measure progress in closing group differences, a comparison of changes in pass percentages is inherently misleading.

Take the case of Texas, from which George Bush acquired his faith in NCLB. As the president described it to the Urban League in 2003: “In my state, Texas, 73% of the white students passed the math test in 1994, while only 38% of African-American students passed it. So we made that the point of reference. We had people focused on the results for the first time–not process, but results. And because teachers rose to the challenge, because the problem became clear, that gap has now closed to 10 points.” President Bush’s numbers are accurately stated. They are also meaningless.

Any test that meets ordinary standards produces an approximation of what statisticians call a “normal distribution” of scores–a bell curve–because achievement in any open-ended skill such as reading comprehension or mathematics really is more or less normally distributed. The tests that produce anything except a bell curve are usually ones so simple that large proportions of students get every item correct. They hide the underlying normal distribution, but don’t change it. Thus point No. 1, that using easy tests and discussing results in terms of pass percentages obscures a reality that NCLB seems bent on denying: All the children cannot be above average. They cannot all even be proficient, if “proficient” is defined legitimately. Some children do not have the necessary skills. Point No. 2 goes to the inherent distortions introduced by the use of pass percentages: Because of the underlying normal distribution, a gain in a given number of points has varying effects on group differences depending on where the gain falls.