Standardized testing
Standardized testing is a testing practice in which tests are designed in such a way that the "questions, conditions for administering, scoring procedures, and interpretations are consistent" (Sylvan Learning, 2006[1]) and are "administered and scored in a predetermined, standard manner" (Popham, 1999[2]). "Standaridized" may also refer to the reference of the score that a test-taker receives. Generally, there are two types of standardized tests: norm-referenced tests and criterion-referenced tests,[1] resulting in a norm-referenced score or a criterion-referenced score, respectively. Norm-referenced scores compare test-takers to a sample of peers, and criterion-referenced scores compare test-takers to a criterion. These standards may help to compare test scores and reduce bias.
Given these definitions, a standardized test is a tool designed with the intention of assessing student knowledge, attainment, or aptitude in a given subject such that it is administered and scored in a standard manner and the scores relate to a normative sample or criterion. In practice, standardized tests can be composed of multiple-choice and true-false questions, and short-answer or essay writing components that are assigned a score by independent evaluators. Standardized tests often include written portions as well; these are graded by humans who use rubrics, or guidelines, as to what a good essay on the subject will be.[citation needed]
History
The earliest evidence of standardized testing based on merit comes from China during the Han dynasty. The concept of a state ruled by men of ability and virtue was an outgrowth of Confucian philosophy. The imperial examinations covered the so-called Six Arts which included music, archery and horsemanship, arithmetic, writing, and knowledge of the rituals and ceremonies of both public and private parts. Later, the five studies were added to the testing (military strategies, civil law, revenue and taxation, agriculture and geography).
United States
- First large-scale use of the IQ test in the US during World War I (circa 1914-18)
- The Scholastic Aptitude Test (SAT) developed in 1926
- The Educational Testing Service established in 1948
- Elementary and Secondary Education Act of 1994 requires standardized testing in public schools
- US Public Law 107-110, known as the No Child Left Behind Act of 2001 further ties public school funding to standardized testing.
Standards
The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any standardized test. However, professional and practitioner associations frequently have placed these concerns within broader contexts when developing standards and making overall judgments about the quality of any standardized test as a whole within a given context.
Evaluation standards
In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation [3] has published three sets of standards for evaluations. The Personnel Evaluation Standards [4] was published in 1988, The Program Evaluation Standards (2nd edition) [5] was published in 1994, and The Student Evaluation Standards [6] was published in 2003.
Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance.
Testing standards
In the field of psychometrics, the Standards for Educational and Psychological Testing [7] place standards about validity and reliability, along with errors of measurement and related considerations under the general topic of test construction, evaluation and documentation. The second major topic covers standards related to fairness in testing, including fairness in testing and test use, the rights and responsibilities of test takers, testing individuals of diverse linguistic backgrounds, and testing individuals with disabilities. The third and final major topic covers standards related to testing applications, including the responsibilities of test users, psychological testing and assessment, educational testing and assessment, testing in employment and credentialing, plus testing in program evaluation and public policy.
Advantages
One of the main advantages of standardized testing is that it is able to provide assessments that are psychometrically valid and reliable, as well as results which are generalizable and replicable.
Another advantage is aggregation. A well designed standardized test provides an assessment of an individual's mastery of a domain of knowledge or skill which at some level of aggregation will provide useful information. That is, while individual assessments may not be accurate enough for practical purposes, the mean scores of classes, schools, branches of a company, or other groups may well provide useful information because of the reduction of error accomplished by increasing the sample size.
While standardized tests are often criticized as unfair, the psychometric standards applied in the development of standardized tests would produce fairer testing if applied in other types of testing. In particular, the effectiveness of each test item in accomplishing the goal of the test would have to be demonstrated.
Criticism
Standards
Perhaps the most important criticism of standardized testing is that many standardized tests fail to meet the standards of their own field. For example, tests of adult literacy are widely used, although there is little evidence that they assess literacy accurately.
Some of the criticisms are standard psychometric arguments. Their validity has been criticized on several grounds. Scores on tests of achievement in mathematics problem-solving are often correlated with scores on tests of language ability; this suggests that the mathematics test is actually measuring the linguistic ability required to understand the presentation of the problems rather than the mathematical ability required to solve them.
Another criticism is that standardized tests assess inadequate samples of skills. Again, however, this criticism cannot validly be made of all standardized tests, although it can be made about the majority of tests of any type.
Much of the opposition to standardized tests has centred on the incorrect use of these tests. In particular, the use of standardized tests of academic achievement to assess individual students is questionable, given the tests' reliability – they are simply not accurate enough to provide adequate assessments of individual students by themselves.
Bias
Standardized tests are also widely criticized as culturally inappropriate for many groups, both in content and in process. Criticism of content usually centers on the differing relevance of the content to people from different cultures – for example, newly arrived immigrants can be expected to have greater difficulty with an intelligence test which asks them to name past leaders of the country to which they have recently immigrated.
Attempts have been made to develop culture-free and culture-fair (culture-neutral) tests of intelligence, but on the whole these attempts have not been successful. Conceptions of intelligence vary widely from culture to culture, and abstracting the few common elements, or what appear to be the few common elements, cannot be depended on to produce a reliable guide to intelligence.
Not without importance is the correlation between standardized test performance and social class and/or degree of wealth. Those who can afford to take often expensive secondary test prep courses designed especially to teach one how to take the test can enjoy a huge advantage over those who cannot afford such courses, which reflects resources available to the student, and not necessarily academic merit.
Education
Educational standardized tests tend to become outdated as curriculum changes.
A common criticism of standardized testing programs in schools is that they encourage teachers to "teach to the test." That is, teachers concentrate on the parts of the curriculum they know will be covered on the test and neglect those that will not. This criticism is certainly worth considering if teachers have foreknowledge of the test and the test is not comprehensive. However, if enough alternative forms of the test are provided, if teachers do not know which form will be used, and if the forms provide a comprehensive sampling of the curriculum, this danger would probably be avoided. Despite the obvious danger of teaching to the test in certain circumstances, though, little research has investigated the prevalence of the phenomenon, or its effects. Furthermore, any form of testing will promote teaching to the test if the consequences of testing are serious and the material on the test is known beforehand.
A related criticism is that students whose teachers train them in test-taking skills unrelated to content will perform better than equally accomplished students whose teachers do not. Some simple test-taking skills can improve scores on multiple-choice standardized tests, so this criticism points to a real danger, especially if standardized tests are used (incorrectly) as the sole measures of achievement or skill. However, little research has investigated the prevalence or effects of this training.
Standardized tests are also criticized for emphasizing recall and recognition rather than higher-order cognitive skills. However, this criticism is not generally valid. While many standardized tests do emphasize recall and recognition, many others assess analytical skills.
Alternatives
Large-scale attempts have been made to substitute performance assessment or "authentic" testing for standardized academic testing. Performance tests require actual performance of a skill; for example, instead of answering questions about a science experiment, a student would be required to perform it. However, performance tests have poor reliability simply because they accumulate so little data. Standardized tests have been found to predict scores on performance tests better than other performance tests do.
References
- ^ a b Sylvan Learning glossary
- ^ Popham, J. (1999). Why standardized tests don’t measure educational quality. Educational Leadership, 56(6), 8-15.
- ^ Joint Committee on Standards for Educational Evaluation
- ^ Joint Committee on Standards for Educational Evaluation. (1988). The Personnel Evaluation Standards: How to Assess Systems for Evaluating Educators. Newbury Park, CA: Sage Publications.
- ^ Joint Committee on Standards for Educational Evaluation. (1994). The Program Evaluation Standards, 2nd Edition. Newbury Park, CA: Sage Publications.
- ^ Committee on Standards for Educational Evaluation. (2003). The Student Evaluation Standards: How to Improve Evaluations of Students. Newbury Park, CA: Corwin Press.
- ^ The Standards for Educational and Psychological Testing
See also
- Alternative assessment
- Criterion-referenced test
- Education
- List of Admissions Tests
- Norm-referenced test
- Standardized testing and public policy