A combination of new federal accountability measures, statesâ plans to comply with them, and new commercial testing products threatens students, teachers and schools with a new wave of inappropriate high-stakes testing.
So cautions ¶¶Òőapp measurement-evaluation expert Madhabi Chatterji in a publication released this week by the , based at the University of Colorado, Boulder.
Chatterjiâs warning â and a series of guidelines for preventing the scenarios she fears â comes as 44 states begin implementing their federally approved plans for meeting the testing and accountability requirements of the Every Student Succeeds Act (), enacted in 2015 under President Obama.
Many of these states are planning to use âstatistically derived indices from test-based data to rank, rate or examine growth of schools or education systems to fulfill ESSAâs requirements,â writes Chatterji, Professor of Measurement, Evaluation & Education, in âHowever, measurement experts, researchers and professional associations (such as the American Educational Research Association and the American Statistical Association) have cautioned against several of these â particularly âstudent growth percentiles,â âvalue-addedâ growth models, and multi-indicator âcompositeâ scores.â
Many states are planning to use âstatistically derived indices from test-based data to rank, rate or examine growth of schools or education systems to fulfill ESSAâs requirements,â writes Chatterj. Misuse of test information in this way is akin to âmisreading a Fahrenheit thermometer in degrees Celsius.â
Misuse of test information in this way, Chatterji writes, is akin to âmisreading a Fahrenheit thermometer in degrees Celsius.â
Chatterji, who is also founding director of TCâs Assessment and Evaluation Research Initiative, says her âConsumerâs Guideâ is not a critique of particular standardized tests or testing programs, but instead a ââtool kitâ for state, national, and district policymakers (and the assessment specialists/researchers who assist them) to help avert the most common pitfalls and adverse consequences of inappropriate test information use for students, families and concerned stakeholders.â A key message â which Chatterji has delivered in many past writings â is that âvalidity is not a fixed propertyâ that can be built into tests. Rather, she writes âthe extent to which tests yield meaningful or valid information on student learning, or the quality of schooling, depends on how appropriately test results are put to use in decision-making contexts.â
The nationâs recent track record in that regard has not been encouraging. For example the old SAT test was not designed to measure the schoolsâ effectiveness â but in 2012, under the No Child Left Behind Act (ESSAâs predecessor), many school districts used it as the basis for identifying exceptional schools and practices.
One of Chatterji's key messages is that âvalidity is not a fixed propertyâ that can be built into tests. Rather, âthe extent to which tests yield meaningful or valid information on student learning, or the quality of schooling, depends on how appropriately test results are put to use in decision-making contexts.â
Nor have test developers helped the situation. Rather than simply providing students ârawâ scores on standardized tests (the total points a student earns for providing correct answers), makers of standardized tests typically provide âscaled scoresâ â scores that have been transformed to enable comparisons among students who took different levels or forms of a test. The statistical wizardry involved can be so complex that such âderivedâ scores become a âblack boxâ to most test users, increasing the likelihood that they will be misused for policy purposes.
The new ESSA guidelines further increase the chances for testing misuse, Chatterji says, because they place heavy pressure and tight restrictions on states to meet self-set goals â including long-term âgrowth-relatedâ targets.
On the broadest level, Chatterji recommends that all test users specify, up front, the kinds of inferences they intend to draw from test data; that they avoid âmulti-purposingâ tests in ways that go beyond either the testâs intended use or the reported evidence; that they justify their uses and inferences of test-based data by referring to specific appropriate criteria for validity, reliability and utility; and that they seek out expert technical review before using tests for accountability purposes.
Among her other, more specific recommendations, Chatterji calls for the use of âdescriptive quality profilesâ â reports on locally valued indicators of student and school success separately â instead of complex statistical indices.
The âhigh stakesâ of statesâ ESSA rollout plans go beyond the immediate impact on schools and students. Past assessments have prompted major backlashes, Chatterji notes â for example, the Opt Out movement (parents who refuse to let their kids to take standardized tests in public schools) and the concurrent decision by many states to opt out of the two national consortia that are implementing assessments geared to the Common Core State Standards. Such fragmentation can result in individual states adopting policies based on their own tests, with similar patterns of misuse of testing data.
And yet, Chatterji says, âthere is a political demand for high-stakes uses of test data that is likely to continue.
âRegardless of the recent backlash, the public still seeks standardized test scores â not only for students, but also obtaining a better gauge of their local schools. Combined, these factors create conditions for some of the recurring testing issues that this guide identifies.â