Standardized test results play an important role in the education landscape today. The results from state-mandated standardized tests are used to make multiple determinations and interpretations about teachers, school administrators, students, and school quality. In most cases, state education bureaucrats use the results from one mandated standardized test in mathematics and one test in English language arts for multiple purposes to meet the various Race To The Top grant and No Child Left Behind waiver reporting requirements for teacher and principal effectiveness as well as college and career readiness for students.
The results from the state-mandated high school mathematics test in Grade 11 could be used to make determinations about (a) the effectiveness of the high school principal, (b) the effectiveness of the high school math teachers, (c) the quality of the school district’s mathematics program, (d) whether a Grade 11 student is college ready, (e) whether that student is career ready, (f) a student’s strengths and weaknesses in math, (g) Grade 12 course placements for that student, and (h) whether the student can graduate high school. That is eight determinations made totally or in part from one test score.
If the test results have not been validated for making multiple determinations, then the decisions made about educators, students, schools, and school districts that are based on the results could be flawed.
A current example includes the use of state test results to rank schools and school districts and to reward and punish them. As I elaborate upon in Education Policy Perils: Tackling the Tough Issues, results from state standardized tests can be predicted with a great deal of accuracy at the school and district levels, using only community demographic data. Some school and district educators are needlessly critiqued, replaced, or put on corrective action while others receive praise, all based on test results that have not been validated for making those types of determinations.
The seventh edition of the Standards for Educational and Psychological Testing contains 12 categories of standards and provides specific guidance on topics that include appropriate test design, development, validity, and use of standardized tests and results (AERA, APA, & NCME, 2014). Standard 1.0 states, “Clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided” (AERA et al., 2014, p. 23). Standard 1.1 expands on this guidance: “No test permits interpretations that are valid for all purposes or in all situations. Each recommended interpretation for a given use requires validation” (AERA et al., 2014, p. 23). Standard 1.1 further recommends, “A rationale should be presented for each intended interpretation of test scores for a given use, together with a summary of the evidence and theory bearing on the intended interpretation” (AERA et al., 2014, p. 23).
For example, using a standardized test administered in Grade 3 to determine college and career readiness would potentially require a validation period of 8 years for the college readiness determination and longer for career readiness validation. College readiness and career readiness are two different determinations and require two separate validations of the test results to make those determinations. Similarly, one might argue for more evidence of validity in the case where an elementary school principal receives an ineffective rating based on school standardized test scores while the majority of her teachers are rated effective via the same test results.
The authors of the standards present specific cautions about using results from standardized tests for multiple purposes in educational settings like P–12 public schools. A test designed to measure the effectiveness of a school principal may not be valid for measuring the effectiveness of a classroom teacher. The authors state clearly, “No one test will serve all purposes equally well” (AERA et al., 2014, p. 195).
Users of standardized test results should attempt to confirm the results for groups and individuals by obtaining multiple forms of data about those groups or individuals. Data from various sources should be triangulated so that a decision is not made based only upon the results from a single state-mandated standardized test.
Educators in a school could develop a menu of other indicators one could use to make important decisions about students and teachers without using any results from state-mandated standardized tests. They could create a simple matrix with the type of determination to be made listed on the left side of the matrix and all the existing sources of data at hand running along the top of the matrix. Then they would be able to easily identify determinations that lack at least three different types of data. That could help alert educators to the types of assessments they might have to develop in-house. More importantly, the exercise will help educators kick the habit of using results from one test for multiple purposes for which the test was not designed.
Dr. Christopher H. Tienken is an Assistant Professor at Seton Hall University in the College of Education and Human Services, Department of Education Leadership, Management, and Policy. He is a former public school teacher and administrator. He serves as the Academic Editor of the KDP Record.
Excerpt from Tienken, C. H. (2015). Test use and abuse. Kappa Delta Pi Record, 51(4), 155–158. Used with permission.